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Preface 


his book is designed for an introductory course on formal languages, automata, 
computability, and related matters. These topics form a major part of what is known as the 
T theory of computation. A course on this subject matter is now standard in the computer 
science curriculum and is often taught fairly early in the program. Hence, the prospective 
audience for this bookconsists primarily of sophomores and juniors majoring in computer 
science or computer engineering. 


Prerequisites for the material in this bookare a knowledge of some higher-level programming 
language (commonly C, C++, or Java™) and familiarity with the fundamentals of data structures and 
algorithms. A course in discrete mathematics that includes set theory, functions, relations, logic, and 
elements of mathematical reasoning is essential. Such a course is part of the standard introductory 
computer science curriculum. 


The study of the theory of computation has several purposes, most importantly (1) to familiarize 
students with the foundations and principles of computer science, (2) to teach material that is useful in 
subsequent courses, and (3) to strengthen students’ ability to carry out formal and rigorous 
mathematical arguments. The presentation I have chosen for this text favors the first two purposes, 
although I would argue that it also serves the third. To present ideas clearly and to give students 
insight into the material, the text stresses intuitive motivation and illustration of ideas through 
examples. When there is a choice, I prefer arguments that are easily grasped to those that are concise 
and elegant but difficult in concept. I state definitions and theorems precisely and give the motivation 
for proofs, but often leave out the routine and tedious details. I believe that this is desirable for 
pedagogical reasons. Many proofs are unexciting applications of induction or contradiction with 
differences that are specific to particular problems. Presenting such arguments in full detail is not 
only unnecessary, but interferes with the flow of the story. Therefore, quite a few of the proofs are 
brief and someone who insists on completeness may consider them lacking in detail. I do not see this 
as a drawback. Mathematical skills are not the byproduct of reading someone else's arguments, but 
come from thinking about the essence of a problem, discovering ideas suitable to make the point, then 
carrying them out in precise detail. The latter skill certainly has to be learned, and I thinkthat the 
proof sketches in this text provide very appropriate starting points for such a practice. 


Computer science students sometimes view a course in the theory of computation as unnecessarily 
abstract and of no practical consequence. To convince them otherwise, one needs to appeal to their 
specific interests and strengths, such as tenacity and inventiveness in dealing with hard-to-solve 
problems. Because of this, my approach emphasizes learning through problem solving. 


By a problem-solving approach, I mean that students learn the material primarily through 
problem-type illustrative examples that show the motivation behind the concepts, as well as their 
connection to the theorems and definitions. At the same time, the examples may involve a nontrivial 
aspect, for which students must discover a solution. In such an approach, homeworkexercises 
contribute to a major part of the learning process. The exercises at the end of each section are 
designed to illuminate and illustrate the material and call on students’ problem-solving ability at 
various levels. Some of the exercises are fairly simple, picking up where the discussion in the text 


leaves off and asking students to carry on for another step or two. Other exercises are very difficult, 
challenging even the best minds. The more difficult exercises are marked with a star. A good mix of 
such exercises can be a very effective teaching tool. Students need not be asked to solve all problems, 
but should be assigned those that support the goals of the course and the viewpoint of the instructor. 
Computer science curricula differ from institution to institution; while a few emphasize the theoretical 
side, others are almost entirely oriented toward practical application. I believe that this text can serve 
either of these extremes, provided that the exercises are selected carefully with the students’ 
background and interests in mind. At the same time, the instructor needs to inform the students about 
the level of abstraction that is expected of them. This is particularly true of the proof-oriented 
exercises. When I say “prove that” or “show that,” I have in mind that the student should think about 
how a proof can be constructed and then produce a clear argument. How formal such a proof should 
be needs to be determined by the instructor, and students should be given guidelines on this early in 
the course. 


The content of the text is appropriate for a one-semester course. Most of the material can be 
covered, although some choice of emphasis will have to be made. In my classes, I generally gloss 
over proofs, giving just enough coverage to make the result plausible, and then ask students to read 
the rest on their own. Overall, though, little can be skipped entirely without potential difficulties later 
on. A few sections, which are marked with an asterisk, can be omitted without loss to later material. 
Most of the material, however, is essential and must be covered. 


The fifth edition of this text introduces a substantial amount of new material. While the 
presentation in the fourth edition has been retained with only minor modifications, two appendices 
have been added. The first is an entire chapter on finite-state transducers, Appendix A. While 
transducers play no significant role in formal language theory, they are important in other areas of 
computer science, such as digital design. Students can benefit from an early exposure to this subject; 
if time permits it is worthwhile to do so. Due to the similarity with finite accepters, this involves few 
new concepts. 


I also added an introduction to JFLAP, an interactive software tool that I feel is of great help in 
both learning the material and in teaching this course. JFLAP implements most of the ideas and 
constructions in this book. It not only helps students visualize abstract concepts, but it is also a great 
time-saver. Many of the exercises in this bookrequire creating structures that are complicated and that 
have to be thoroughly tested for correctness. JFLAP can reduce the time required for this by an order 
of magnitude. Appendix B gives a brief introduction to JFLAP and the CD that comes with the 
bookexpands on this. I very much recommend the use of JFLAP for both students and instructors. 


Peter Linz 


Chapter 1 


Introduction to 
the Theory of 
Computation 


he subject matter of this book, the theory of computation, includes several topics: automata 

theory, formal languages and grammars, computability, and complexity. Together, this 

T material constitutes the theoretical foundation of computer science. Loosely speaking we 

can think of automata, grammars, and computability as the study of what can be done by 

computers in principle, while complexity addresses what can be done in practice. In this 

book we focus almost entirely on the first of these concerns. We will study various automata, see how 

they are related to languages and grammars, and investigate what can and cannot be done by digital 
computers. Although this theory has many uses, it is inherently abstract and mathematical. 


Computer science is a practical discipline. Those who work in it often have a marked preference 
for useful and tangible problems over theoretical speculation. This is certainly true of computer 
science students who are concerned mainly with difficult applications from the real world. 
Theoretical questions interest them only if they help in finding good solutions. This attitude is 
appropriate, since without applications there would be little interest in computers. But given this 
practical orientation, one might well ask “why study theory?” 


The first answer is that theory provides concepts and principles that help us understand the 
general nature of the discipline. The field of computer science includes a wide range of special 
topics, from machine design to programming. The use of computers in the real world involves a 
wealth of specific detail that must be learned for a successful application. This makes computer 
science a very diverse and broad discipline. But in spite of this diversity, there are some common 
underlying principles. To study these basic principles, we construct abstract models of computers and 
computation. These models embody the important features that are common to both hardware and 
software and that are essential to many of the special and complex constructs we encounter while 
working with computers. Even when such models are too simple to be applicable immediately to 
real-world situations, the insights we gain from studying them provide the foundation on which 
specific development is based. This approach is, of course, not unique to computer science. The 
construction of models is one of the essentials of any scientific discipline, and the usefulness of a 
discipline is often dependent on the existence of simple, yet powerful, theories and laws. 

A second, and perhaps not so obvious, answer is that the ideas we will discuss have some 
immediate and important applications. The fields of digital design, programming languages, and 
compilers are the most obvious examples, but there are many others. The concepts we study here run 
like a thread through much of computer science, from operating systems to pattern recognition. 


The third answer is one of which we hope to convince the reader. The subject matteris 


intellectually stimulating and fun. It provides many challenging, puzzle-like problems that can lead to 
some sleepless nights. This is problem solving in its pure essence. 


In this book, we will look at models that represent features at the core of all computers and their 
applications. To model the hardware of a computer, we introduce the notion of an automaton (plural, 
automata). An automaton is a construct that possesses all the indispensable features of a digital 
computer. It accepts input, produces output, may have some temporary storage, and can make 
decisions in transforming the input into the output. A formal language is an abstraction of the general 
characteristics of programming languages. A formal language consists of a set of symbols and some 
rules of formation by which these symbols can be combined into entities called sentences. A formal 
language is the set of all sentences permitted by the rules of formation. Although some of the formal 
languages we study here are simpler than programming languages, they have many of the same 
essential features. We can learn a great deal about programming languages from formal languages. 
Finally, we will formalize the concept of a mechanical computation by giving a precise definition of 
the term algorithm and study the kinds of problems that are (and are not) suitable for solution by such 
mechanical means. In the course of our study, we will show the close connection between these 
abstractions and investigate the conclusions we can derive from them. 


In the first chapter, we look at these basic ideas in a very broad way to set the stage for later 
work. In Section 1.1, we review the main ideas from mathematics that will be required. While 
intuition will frequently be our guide in exploring ideas, the conclusions we draw will be based on 
rigorous arguments. This will involve some mathematical machinery, although the requirements are 
not extensive. The reader will need a reasonably good grasp of the terminology and of the elementary 
results of set theory, functions, and relations. Trees and graph structures will be used frequently, 
although little is needed beyond the definition of a labeled, directed graph. Perhaps the most stringent 
requirement is the ability to follow proofs and an understanding of what constitutes proper 
mathematical reasoning. This includes familiarity with the basic proof techniques of deduction, 
induction, and proof by contradiction. We will assume that the reader has this necessary background. 
Section 1.1 is included to review some of the main results that will be used and to establish a 
notational common ground for subsequent discussion. 


In Section 1.2, we take a first look at the central concepts of languages, grammars, and automata. 
These concepts occur in many specific forms throughout the book. In Section 1.3, we give some 
simple applications of these general ideas to illustrate that these concepts have widespread uses in 
computer science. The discussion in these two sections will be intuitive rather than rigorous. Later, 
we will make all of this much more precise; but for the moment, the goal is to get a clear picture of 
the concepts with which we are dealing. 


1.1 Mathematical Preliminaries and Notation 


Sets 


A set is a collection of elements, without any structure other than membership. To indicate that x is an 
element of the set S, we write x € S. The statement that x is not in S is writtenx ¢ S. A set can be 
specified by enclosing some description of its elements in curly braces; for example, the set of 


integers 0, 1, 2 is shown as 
S= {0, 1, 2}. 


Ellipses are used whenever the meaning is clear. Thus, {a, b,..., z} stands for all the lowercase 
letters of the English alphabet, while {2, 4, 6,...} denotes the set of all positive even integers. When 
the need arises, we use more explicit notation, in which we write 


S = {i : i > 0,2 is even} (1.1) 


for the last example. We read this as “S is the set of all i, such that į is greater than zero, andi is 
even,” implying, of course, that 7 is an integer. 


The usual set operations are union (U), intersection (N), and difference (—) defined as 


Sy E So = {r [fe Si or re So} : 
S1 N S2 = {x : x E Sı and z E S2} 
Sı — S2 = {x : r E€ Sı and r¢ So}. 


Another basic operation is complementation. The complement of a set S, denoted by 5: consists 
of all elements not in S. To make this meaningful, we need to know what the universal set U of all 
possible elements is. If U is specified, then 

S={r:rE€U,r¢S}. 


The set with no elements, called the empty set or the null set, is denoted by ©. From the 
definition of a set, it is obvious that 


STA =a 
ee 
Ç — S 


are needed on several occasions. 


A set S} is said to be a subset of S if every element of S} is also an element of S. We write this as 
S ES. 


If S; E S, but S contains an element not in $4, we say that S} is a proper subset of S; we write this as 


Ses 


If S, and S, have no common element, that is, S; N S> = ø, then the sets are said to be disjoint. 


A set is said to be finite if it contains a finite number of elements; otherwise it is infinite. The 
size of a finite set is the number of elements in it; this is denoted by |S]. 


A given set normally has many subsets. The set of all subsets of a set S is called the powerset of 
S and is denoted by 2°. Observe that 2° is a set of sets. 


Example 1.1 


If S is the set {a, b, c}, then its powerset is 
25 = {S,{a},{b} {ce}, {a,b} {a,c}, {b,c}. {a,b,c}}. 


Here |S| = 3 and |2°| = 8. This is an instance of a general result; if S is finite, then 


In many of our examples, the elements of a set are ordered sequences of elements from other sets. 
Such sets are said to be the Cartesian product of other sets. For the Cartesian product of two sets, 
which itself is a set of ordered pairs, we write 


Example 1.2 


Let S| = {2, 4} and S, = {2, 3, 5, 6}. Then 
S1 X 52 = {(2, 2), (2, 3), (2, 5), (2, 6), (4, 2), (4, 3), (4, 5), (4, 6)F. 


Note that the order in which the elements of a pair are written matters. The pair (4, 2) is in S, X $, 
but (2, 4) is not. 
The notation is extended in an obvious fashion to the Cartesian product of more than two sets; 
generally 
Si x So x--- X Syn = {(21, 79,...,2n) : E Si}. 


A set can be divided by separating it into a number of subsets. Suppose that S4, S2, S, are subsets 
of a given set S and that the following holds: 


1. The subsets S), S5,...S,, are mutually disjoint; 
2.5,US,U...US, =S; 


3. none of the S; is empty. 
Then S}, S5,...S,, 18 called a partition of S. 


Functions and Relations 


A function is a rule that assigns to elements of one set a unique element of another set. If f denotes a 
function, then the first set is called the domain of f, and the second set is its range. We write 


f:S, > S 


to indicate that the domain of fis a subset of S4 and that the range of fis a subset of S,. If the domain 
of fis all of S|, we say that fis a total function on S|; otherwise f is said to be a partial function. 


In many applications, the domain and range of the functions involved are in the set of positive 
integers. Furthermore, we are often interested only in the behavior of these functions as their 
arguments become very large. In such cases an understanding of the growth rates may suffice and a 
common order of magnitude notation can be used. Let f (n) and g (n) be functions whose domain is a 
subset of the positive integers. If there exists a positive constant c such that for all sufficiently large n 


f(n) < elg(n)], 
we say that f has order at most g. We write this as 


f (n) = O(q(n)) 


|f (n)| = c |g (n)|, 
then f has order at least g, for which we use 
fn) = 42 (gin): 
Finally, if there exist constants c, and c, such that 
cı |g (n)| < |f (n)| < c2 |g (n)I, 
fand g have the same order of magnitude, expressed as 
f(n)=O0(g(n)). 


In this order-of-magnitude notation, we ignore multiplicative constants and lower-order terms that 


become negligible as n increases. 


Example 1.3 
Let 
fia) = Qn? + 3n. 
gin) = "a 
h(n) = 10n? + 100. 
Then 
fin) = O(g(n)), 
gin) = (h (n)), 
fin) = O(hi(n)). 


In order-of-magnitude notation, the symbol = should not be interpreted as equality and order-of- 
magnitude expressions cannot be treated like ordinary expressions. Manipulations such as 


O(n) + O(n) = 20 (n) 
are not sensible and can lead to incorrect conclusions. Still, if used properly, the order-of-magnitude 


arguments can be effective, as we will see in later chapters. 


Some functions can be represented by a set of pairs 
{ (£i; Yi E (£2, Yo We ce 5 


where x; is an element in the domain of the function, and y; is the corresponding value in its range. For 
such a set to define a function, each x; can occur at most once as the first element of a pair. If this is 


not satisfied, the set is called a relation. Relations are more general than functions: In a function each 
element of the domain has exactly one associated element in the range; in a relation there may be 
several such elements in the range. 


One kind of relation is that of equivalence, a generalization of the concept of equality (identity). 
To indicate that a pair (x, y) is in an equivalence relation, we write 


x=y. 
A relation denoted by = is considered an equivalence if it satisfies three rules: the reflexivity rule 
r = xz for all z; 


the symmetry rule 


if x = y, then y = 2; 


and the transitivity rule 


if x = y and y = z, then r =z. 
Example 1.4 


On the set of nonnegative integers, we can define a relation 


r=y 
if and only if 
x mod 3 = y mod 3. 


Then 2 = 5, 12 = 0, and 0 = 36. Clearly this is an equivalence relation, as it satisfies reflexivity, 
symmetry, and transitivity. 


If S is a set on which we have a defined equivalence relation, then we can use this equivalence to 
partition the set into equivalence classes. Each equivalence class contains all and only equivalent 
elements. 


Graphs and Trees 


A graph is a construct consisting of two finite sets, the set V = {0}, 05,..., 0, } of vertices and the set £ 
= {e}, €2,..., €m} of edges. Each edge is a pair of vertices from V, for instance, 


is an edge from v; to v. We say that the edge e; is an outgoing edge for v; and an incoming edge for 
v,. Such a construct is actually a directed graph (digraph), since we associate a direction (from v; to 
v) with each edge. Graphs may be labeled, a label being a name or other information associated with 
parts of the graph. Both vertices and edges may be labeled. 

Graphs are conveniently visualized by diagrams in which the vertices are represented as circles 
and the edges as lines with arrows connecting the vertices. The graph with vertices {0 , v2, 03} and 
edges {(01, 03), (03, 01), (03, V2), (03, 03)} is depicted in Figure 1.1. 

A sequence of edges (v;, 0;), (Vj, 0¢)s-++5 (Om Vn) 18 Said to be a walk from v; to v,. The length of a 
walk is the total number of edges traversed in going from the initial vertex to the final one. A walk in 
which no edge is repeated is said to be a path; a path is simple if no vertex is repeated. A walk from 
v; to itself with no repeated edges is called a cycle with base v;. If no vertices other than the base are 


repeated in a cycle, then it is said to be simple. In Figure 1.1, (01, 03), (03, 02) is a simple path from v; 


to v2. The sequence of edges (01, 03), (03, 03), (03, V1) is a cycle, but not a simple one. If the edges of a 
graph are labeled, we can talk about the label of a walk. This label is the sequence of edge labels 
encountered when the path is traversed. Finally, an edge from a vertex to itself is called a loop. In 
Figure 1.1, there is a loop on vertex 03. 


Figure 1.1 


On several occasions, we will refer to an algorithm for finding all simple paths between two 
given vertices (or all simple cycles based on a vertex). If we do not concern ourselves with 
efficiency, we can use the following obvious method. Starting from the given vertex, say v;, list all 


outgoing edges (0,, v4), (V; 0)/),-.-At this point, we have all paths of length one starting at v; For all 
vertices v}, 0;,...80 reached, we list all outgoing edges as long as they do not lead to any vertex 


already used in the path we are constructing. After we do this, we will have all simple paths of length 
two originating at v; We continue this until all possibilities are accounted for. Since there are only a 
finite number of vertices, we will eventually list all simple paths beginning at v;. From these we 
select those ending at the desired vertex. 

Trees are a particular type of graph. A tree is a directed graph that has no cycles, and that has one 
distinct vertex, called the root, such that there is exactly one path from the root to every other vertex. 
This definition implies that the root has no incoming edges and that there are some vertices without 
outgoing edges. These are called the leaves of the tree. If there is an edge from v; to v,, then v; is said 


to be the parent ofv,, and v; the child ofv,. The level associated with each vertex is the number of 


edges in the path from the root to the vertex. The height of the tree is the largest level number of any 
vertex. These terms are illustrated in Figure 1.2. 


At times, we want to associate an ordering with the nodes at each level; in such cases we talk 
about ordered trees. 


Figure 1.2 


Root 


Feat Height =3 


(> — Level 3--* 


More details on graphs and trees can be found in most books on discrete mathematics. 


Proof Techniques 


An important requirement for reading this text is the ability to follow proofs. In mathematical 
arguments, we employ the accepted rules of deductive reasoning, and many proofs are simply a 
sequence of such steps. Two special proof techniques are used so frequently that it is appropriate to 
review them briefly. These are proof by induction and proof by contradiction. 


Induction is a technique by which the truth of a number of statements can be inferred from the truth 
of a few specific instances. Suppose we have a sequence of statements P4, P>,...we want to prove to 


be true. Furthermore, suppose also that the following holds: 


1. For some k > 1, we know that P}, P5,..., Py are true. 


2. The problem is such that for any n > k, the truths of P4, P>,..., P, imply the truth of P,,.1. 


We can then use induction to show that every statement in this sequence is true. 


In a proof by induction, we argue as follows: From Condition 1 we know that the first k 
statements are true. Then Condition 2 tells us that P,,, also must be true. But now that we know that 


the first k + 1 statements are true, we can apply Condition 2 again to claim that P}, must be true, and 


so on. We need not explicitly continue this argument, because the pattern is clear. The chain of 
reasoning can be extended to any statement. Therefore, every statement is true. 


The starting statements P}, P>,...P, are called the basis of the induction. The step connecting P, 
with P„+; is called the inductive step. The inductive step is generally made easier by the inductive 
assumption that P4, P>,..., P„ are true, then argue that the truth of these statements guarantees the truth 
of P,, , ;. Ina formal inductive argument, we show all three parts explicitly. 


Example 1.5 


A binary tree is a tree in which no parent can have more than two children. Prove that a binary tree of 
height n has at most 2” leaves. 

Proof: If we denote the maximum number of leaves of a binary tree of height n by / (n), then we want 
to show that / (n) < 2”. 

Basis: Clearly / (0) = 1 = 2° since a tree of height 0 can have no nodes other than the root, that is, it 
has at most one leaf. 

Inductive Assumption: 


l(i) < 2°, for i =0,1,...,n. (1.4) 


Inductive Step: To get a binary tree of height n + 1 from one of height n, we can create, at most, two 
leaves in place of each previous one. Therefore, 


lin +1) =21(n) 
Now, using the inductive assumption, we get 
i(ntij<2x2*= 2") 


Thus, if our claim is true for n, it must also be true for n + 1. Since n can be any number, the statement 
must be true for all n. m 


Here we introduce the symbol m that is used in this book to denote the end of a proof. 


Inductive reasoning can be difficult to grasp. It helps to notice the close connection between 
induction and recursion in programming. For example, the recursive definition of a function f (n), 
where n is any positive integer, often has two parts. One involves the definition of f (n +1) in terms of 
f(n), f(n — 1),...f (1). This corresponds to the inductive step. The second part is the “escape” from 
the recursion, which is accomplished by definingf (1),f (2),...,f (k) nonrecursively. This 
corresponds to the basis of induction. As in induction, recursion allows us to draw conclusions about 
all instances of the problem, given only a few starting values and using the recursive nature of the 
problem. 


Sometimes, a problem looks difficult until we look at it in just the right way. Often looking at it 
recursively simplifies matters greatly. 


Example 1.6 


A set 4, l... |, of mutually intersecting straight lines divides the plane into a number of separated 


regions. A single line divides the plane into two parts, two lines generate four regions, three lines 
make seven regions, and so on. This is easily checked visually for up to three lines, but as the number 
of lines increases it becomes difficult to spot a pattern. Let us try to solve this problem recursively. 


Look at Figure 1.3 to see what happens if we add a new line /,,,, to existing n lines. The region to 
the left of /; is divided into two new regions, so is the region to the left of /,, and so on until we get to 


the last line. At the last line, the region to the right of /, is also divided. Each of the n intersections 
then generates one new region, with one extra at the end. So, 


Figure 1.3 


if we let A (n) denote the number of regions generated by n lines, we see that 


A(n+1)=A(n)+n+1,n=1,2,..., 


with A (1) = 2. From this simple recursion we then calculate 4 (2) = 4, A (3) = 7, A (4) = 11, and so 
on. 


To get a formula for A (n) and to show that it is correct, we use induction. If we conjecture that 


È 


then 


( L) 
A(n+1)= ie A Seek 


(n+ 1)(n+ 2) 
a 


justifies the inductive step. The basis is easily checked, completing the argument. 


In this example we have been a little less formal in identifying the basis, inductive assumption, 
and inductive step, but they are there and are essential. To keep our subsequent discussions from 
becoming too formal, we will generally prefer the style of this second example. However, if you have 
difficulty in following or constructing a proof, go back to the more explicit form of Example 1.5. 


Proof by contradiction is another powerful technique that often works when everything else fails. 
Suppose we want to prove that some statement P is true. We then assume, for the moment, that P is 
false and see where that assumption leads us. If we arrive at a conclusion that we know is incorrect, 
we can lay the blame on the starting assumption and conclude that P must be true. The following is a 
classic and elegant example. 


Example 1.7 


A rational number is a number that can be expressed as the ratio of two integers n and m so that n and 


6) 
m have no common factor. A real number that is not rational is said to be irrational. Show that Y ^ is 


irrational. 


As in all proofs by contradiction, we assume the contrary of what we want to show. Here we 


assume that is a rational number so that it can be written as 


— n 
V2 = — 
m 


where n and m are integers without a common factor. Rearranging (1.5), we have 
2m? = n?. 
Therefore, n? must be even. This implies that n is even, so that we can write n = 2k or 
2m? = 4k?, 
and 


9 
mi” — 2k". 


. 
(1-5) 


Therefore, m is even. But this contradicts our assumption that n and m have no common factors. Thus, 


m and n in (1.5) cannot exist and Y ^ is not a rational number. 


This example exhibits the essence of a proof by contradiction. By making a certain assumption we 
are led to a contradiction of the assumption or some known fact. If all steps in our argument are 


logically sound, we must conclude that our initial assumption was false. 


EXERCISES 


1. Use induction on the size of S to show that if S is a finite set, then |25| = 2'1, 
2. Show that if S} and S, are finite sets with |S4= n and |S5| = m, then 
|.S4 LI S2| <n+m. 


3. If S; and S, are finite sets, show that |S, x S| = |S||S5]. 


4. Consider the relation between two sets defined by S; = S, if and only if |S;| = |S,|. Show that this is 


an equivalence relation. 


5. Prove DeMorgan's laws, Equations (1.2) and (1.3). 


6. Occasionally, we need to use the union and intersection symbols in a manner analogous to the 
summation sign >’. We define 


S, =& US;US; 


pe {i,7,k,...} 
with an analogous notation for the intersection of several sets. 


With this notation, the general DeMorgan's laws are written as 


Us=(1s 
EP 


pEP 


and 


pEP i pEP 
Prove these identities when P is a finite set. 


7. Show that 


Sy J S2 = Sı fe 


8. Show that S, = S, if and only if 


9. Show that 


10. Show that the distributive law 
S1 N (S2 U $3) = (S1 N S2) U (S1 N S3) 
holds for sets. 
11. Show that 
Si x (SU S3) = (S1 x S2) U (S1 x S3) 
12. Show that if S} E S), then S2 E S1., 
13. Give conditions on S; and S, necessary and sufficient to ensure that 


Sı = (S1 U S2) — So. 


14. Use the equivalence defined in Example 1.4 to partition the set {2, 4, 5, 6, 9, 23, 24, 25, 31, 37} 
into equivalence classes. 


15. Show that iff (n)= O (e (n)) and g (n) = 0 (f (n)), then f (n) = © (e (n)). 
16. Show that 2” = O (3”) but 2” # © (3”). 
17. Show that the following order-of-magnitude results hold. 
(a) n? + 5 logn = O (nô). 
(b) 3” =0 (n!). 
(c) n!= O(n"). 
18. Prove that iff (n) = O (e (n)) and g (n)= O (h (n)), then f (n) = O (h (n)). 
19. Show that iff (n)= O (n’) and g (n) = O (n°), then 
f (n)+g(n)=0 (n?) 
and 
f (n)g(n)=0 (n°). 
20. Assume that An) = 2n? + n and g (n) = O(n’). What is wrong with the following argument? 
f (n) = O(n”) + O(n), 
so that 
f (n) —g(n) = O(n”) + O(n) — O(n’). 
Therefore, 
f (n)—g (n) = O(n). 
21. Show that iff (n) = © (log, n), then f (n) = © (logio 7). 


22. Draw a picture of the graph with vertices {v], D>, 03} and edges {(01, v1), (01, 02), (V2, 03), (V2, 
D1), (03, v1) }. Enumerate all cycles with base v4. 


23. Let G = (V, E) be any graph. Prove the following claim: If there is any walk between v; € V and 
v; € V, then there must be a path of length no larger than |] — 1 between these two vertices. 


24. Consider graphs in which there is at most one edge between any two vertices. Show that under 
this condition a graph with n vertices has at most n? edges. 


25. Show that 


26. 


2T: 
28. 


29. 


30. 
31. 
32. 


33. 
34. 


ye _ n(n+1)(2n+4+1) 


Show that 


n 


A a 


to 


Prove that for all n > 4 the inequality 2” < n! holds. 


The Fibonacci sequence is defined recursively by 
f(n+2) = f(n+1)+ f(n), n = 1,2,.., 
with f(1) = 1, f(2) = 1. Show that 
(a) f (n) = O (2"), 
(b) f (n) =Q 1.5"). 


Show that V8 is not a rational number. 


G) 
Show that 2 — Y ~ is irrational. 


Show that V3 is irrational. 


Prove or disprove the following statements. 
(a) The sum of a rational and an irrational number must be irrational. 


(b) The sum of two positive irrational numbers must be irrational. 
(c) The product of a non-zero rational and an irrational number must be irrational. 


Show that every positive integer can be expressed as the product of prime numbers. 


Prove that the set of all prime numbers is infinite. 


35. A prime pair consists of two primes that differ by two. There are many prime pairs, for example, 


11 and 13, 17 and 19, etc. Prime triplets are three numbers n > 2, n + 2, + 4 that are all prime. 
Show that the only prime triplet is (3, 5, 7). 


1.2 Three Basic Concepts 


Three fundamental ideas are the major themes of this book: languages, grammars, and automata. In 
the course of our study we will explore many results about these concepts and about their relationship 
to each other. First, we must understand the meaning of the terms. 


Languages 


We are all familiar with the notion of natural languages, such as English and French. Still, most of us 
would probably find it difficult to say exactly what the word “language” means. Dictionaries define 
the term informally as a system suitable for the expression of certain ideas, facts, or concepts, 
including a set of symbols and rules for their manipulation. While this gives us an intuitive idea of 
what a language is, it is not sufficient as a definition for the study of formal languages. We need a 
precise definition for the term. 

We start with a finite, nonempty set }, of symbols, called the alphabet. From the individual 
symbols we construct strings, which are finite sequences of symbols from the alphabet. For example, 
if the alphabet È} = {a, b}, then abab and aaabbba are strings on >’. With few exceptions, we will use 
lowercase letters a, b, c,...for elements of }, and u, v, w,...for string names. We will write, for 
example, 


w = abaaa 


to indicate that the string named w has the specific value abaaa. 


The concatenation of two strings w and v is the string obtained by appending the symbols of v to 
the right end of w, that 1s, if 


W = A102 '* ` apn 
and 


ER TOE; 


then the concatenation of w and v, denoted by wv, is 


WU = A1092- Anbiba- -- bm. 


The reverse of a string is obtained by writing the symbols in reverse order; if w is a string as shown 
above, then its reverse w* is 


2 = 
w? = ân ''' a201. 


The length ofa string w, denoted by |w], is the number of symbols in the string. We will frequently 
need to refer to the empty string, which is a string with no symbols at all. It will be denoted by 2. 
The following simple relations 
A 


Aw = wA= Ww 


=) 


hold for all w. 


Any string of consecutive symbols in some w is said to be a substring of w. If 


w = Vu, 


then the substrings v and u are said to be a prefix and a suffix of w, respectively. For example, if w = 
abbab, then {/, a, ab, abb, abba, abbab} is the set of all prefixes of w, while bab, ab, b are some of 
its suffixes. 


Simple properties of strings, such as their length, are very intuitive and probably need little 
elaboration. For example, if u and v are strings, then the length of their concatenation is the sum of the 
individual lengths, that 1s, 


luv| = |u| + |e]. (1.6) 
But although this relationship is obvious, it is useful to be able to make it precise and prove it. 
The techniques for doing so are important in more complicated situations. 
Example 1.8 
Show that (1.6) holds for any u and v. To prove this, we first need a definition of the length of a 
string. We make such a definition in a recursive fashion by 
ja] = l; 
lwa] = [w| +1, 


for alla € > andw any string on >. This definition is a formal statement of our intuitive 
understanding of the length of a string: The length of a single symbol is one, and the length of any 
string is increased by one if we add another symbol to it. With this formal definition, we are ready to 
prove (1.6) by induction characters. 


By definition, (1.6) holds for all u of any length and all v of length 1, so we have a basis. As an 
inductive assumption, we take that (1.6) holds for all u of any length and all v of length 1, 2,..., n. 
Now take any v of length n + 1 and write it as v = wa. Then, 


lv] = fel +1, 


juv| = Juwa] = |u| + 1. 
By the inductive hypothesis (which is applicable since w is of length n), 
juw] = [u] + foo 
so that 
juv| = Ju] + Jeo] +1 = Ju] + Jo}. 
Therefore, (1.6) holds for all u and all v of length up to n + 1, completing the inductive step and the 


argument. 


Ifw is a string, then w” stands for the string obtained by repeating œ n times. As a special case, 
we define 


0 
w = À, 


for all w. 


Ifẹ is an alphabet, then we use }`*¥ to denote the set of strings obtained by concatenating zero or 
more symbols from >’. The set ))* always contains à. To exclude the empty string, we define 


ut = D* — {A} 


While > is finite by assumption, )* and X" are always infinite since there is no limit on the length of 
the strings in these sets. A language is defined very generally as a subset of >:*. A string in a language 
L will be called a sentence of L. This definition is quite broad; any set of strings on an alphabet >) 
can be considered a language. Later we will study methods by which specific languages can be 
defined and described; this will enable us to give some structure to this rather broad concept. For the 
moment, though, we will just look at a few specific examples. 


Example 1.9 


Let} = {a, b}. Then 
E* = {), a,b, aa, ab, ba, bb, aaa, aab,...}. 
The set 
{a,aa, aab} 
is a language on >’. Because it has a finite number of sentences, we call it a finite language. The set 


L= {a"b” a a a 0} 


is also a language on >’. The strings aabb and aaaabbbb are in the language L, but the string abb is 
not in L. This language is infinite. Most interesting languages are infinite. 


Since languages are sets, the union, intersection, and difference of two languages are immediately 
defined. The complement of a language is defined with respect to }`*; that is, the complement of L is 


The reverse of a language is the set of all string reversals, that is, 
LE = {wf :wE L}. 


The concatenation of two languages L, and L, is the set of all strings obtained by concatenating any 
element of Z, with any element of L,; specifically, 


We define L” as L concatenated with itself times, with the special cases 


L? = {) 
and 
ERS 
for every language L. 
Finally, we define the star-closure of a language as 
P= PPTs 
and the positive closure as 
j ues AG Fo 


Example 1.10 


If 
L.={a"b™ n> O}, 
then 
L? = ja"b"a™b™ :n > 0,m > 0}. 
Note that n and m in the above are unrelated; the string aabbaaabbb is in L?. 


The reverse of L is easily described in set notation as 


LË = Ea :n > O}, 


but it is considerably harder to describe L or L* this way. A few tries will quickly convince you of 
the limitation of set notation for the specification of complicated languages. 


Grammars 


To study languages mathematically, we need a mechanism to describe them. Everyday language is 
imprecise and ambiguous, so informal descriptions in English are often inadequate. The set notation 
used in Examples 1.9 and 1.10 is more suitable, but limited. As we proceed we will learn about 
several language-definition mechanisms that are useful in different circumstances. Here we introduce 
a common and powerful one, the notion of a grammar. 


A grammar for the English language tells us whether a particular sentence is well-formed or not. 


A typical rule of English grammar is “a sentence can consist of a noun phrase followed by a 
predicate.” More concisely we write this as 


(sentence) — (noun phrase) (predicate) , 


with the obvious interpretation. This is, of course, not enough to deal with actual sentences. We must 
now provide definitions for the newly introduced constructs (noun_phrase) and (predicate). If we do 
so by 


(noun-phrase) — (article) (noun) , 


(predicate) — (verb) , 


and if we associate the actual words “a” and “the” with (@"ticle), “boy” and “dog” with \”0U"),, and 
“runs” and “walks” with (ve”®),, then the grammar tells us that the sentences “a boy runs” and “the 
dog walks” are properly formed. If we were to give a complete grammar, then in theory, every proper 
sentence could be explained this way. 


This example illustrates the definition of a general concept in terms of simple ones. We start with 


the top-level concept, here (sentence), and successively reduce it to the irreducible building blocks 
of the language. The generalization of these ideas leads us to formal grammars. 


Definition 1.1 


A grammar G is defined as a quadruple 
G=(V, T, S, P), 


where V is a finite set of objects called variables, 
T is a finite set of objects called terminal symbols, 
S € Vis a special symbol called the start variable, 
P isa finite set of productions. 


It will be assumed without further mention that the sets V and 7 are nonempty and disjoint. 


The production rules are the heart of a grammar; they specify how the grammar transforms one 
string into another, and through this they define a language associated with the grammar. In our 
discussion we will assume that all production rules are of the form 


EY, 
where x is an element of (V U T)* and y is in (V u 7)*. The productions are applied in the following 
manner: Given a string w of the form 


w = ULV, 


we say the productionx — y is applicable to this string, and we may use it to replace x with y, 
thereby obtaining a new string 


z= uyv. 
This is written as 


w => 


We say that w derives z or that z is derived from w. Successive strings are derived by applying the 
productions of the grammar in arbitrary order. A production can be used whenever it is applicable, 
and it can be applied as often as desired. If 


wy > W > -> Wn, 
we say that w; derives w, and write 


* 
wy =. WwW 


The * indicates that an unspecified number of steps (including zero) can be taken to derive w, from 
Wi. 


By applying the production rules in a different order, a given grammar can normally generate 
many strings. The set of all such terminal strings is the language defined or generated by the grammar. 


Definition 1.2 
Let G= (V, T, S, P) be a grammar. Then the set 
L(G) = fu eT’: 53 wh 
is the language generated by G. 


If w e L (G), then the sequence 


o > Wy > WwW > Un Dw 


is a derivation of the sentence w. The strings S, w4, W>,..., Wp, which contain variables as well as 
terminals, are called sentential forms of the derivation. 


Example 1.11 


Consider the grammar 


with P given by 


S — asb; 
S— xX. 


Then 
S = aSb => aaSbb => aabb, 
SO we can write 
S Š aabb. 


The string aabb is a sentence in the language generated by G, while aaSbb is a sentential form. 


A grammar G completely defines L (G), but it may not be easy to get a very explicit description of 
the language from the grammar. Here, however, the answer is fairly clear. It is not hard to conjecture 
that 


L(G) = {a"b" : n > 0}, 


and it is easy to prove it. If we notice that the rule S — aSb is recursive, a proof by induction readily 
suggests itself. We first show that all sentential forms must have the form 


w,; =a'Sb’. (1.7) 


Suppose that (1.7) holds for all sentential forms w; of length 2i + 1 or less. To get another sentential 
form (which is not a sentence), we can only apply the production S — aSb. This gets us 


at Sb’ >a wi spt : 


so that every sentential form of length 27 + 3 is also of the form (1.7). Since (1.7) is obviously true for 
i = 1, it holds by induction for all i. Finally, to get a sentence, we must apply the production $ — å, 
and we see that 


S => a” Sb” => a™b" 


represents all possible derivations. Thus, G can derive only strings of the form ab”. 


We also have to show that all strings of this form can be derived. This is easy; we simply apply S 
— aSb as many times as needed, followed by S — 4. 


Example 1.12 


Find a grammar that generates 


= Cu ai n> 0} ; 


The idea behind the previous example can be extended to this case. All we need to do is generate an 
extra b. This can be done with a production S — Ab, with other productions chosen so that A can 
derive the language in the previous example. Reasoning in this fashion, we get the grammar G =({S, 
A}, {a, b}, S, P), with productions 


S — Ab, 
A — aAb. 
A —> À. 


Derive a few specific sentences to convince yourself that this works. 


The previous examples are fairly easy ones, so rigorous arguments may seem superfluous. But 
often it is not so easy to find a grammar for a language described in an informal way or to give an 
intuitive characterization of the language defined by a grammar. To show that a given language is 
indeed generated by a certain grammar G, we must be able to show (a) that everyw eL can be 
derived from S using G and (b) that every string so derived is in L. 


Example 1.13 


Take >} = {a, b}, and letn, (w) andn, (w) denote the number ofa’s and b’s in the string w, 
respectively. Then the grammar G with productions 


SSS: 
S — À, 

S — aSb, 
S — bSa 


generates the language 
L= {w : Na (W) = np lw) } . 


This claim is not so obvious, and we need to provide convincing arguments. 


First, it is clear that every sentential form of G has an equal number of a’s and b’s, since the only 
productions that generate ana, namely S — aSb andS — bSa, simultaneously generate ab. 
Therefore, every element of L ( G) is in L. It is a little harder to see that every string in L can be 
derived with G. 


Let us begin by looking at the problem in outline, considering the various forms w € L can have. 
Suppose w starts with a and ends with b. Then it has the form 


w = aw ,b, 
where w is also in L. We can think of this case as being derived starting with 


S => aSb 


if S does indeed derive any string in L. A similar argument can be made if w starts with b and ends 
with a. But this does not take care of all cases, since a string in L can begin and end with the same 
symbol. If we write down a string of this type, say aabbba, we see that it can be considered as the 
concatenation of two shorter strings aabb and ba, both of which are in L. Is this true in general? To 
show that this is indeed so, we can use the following argument: Suppose that, starting at the left end of 
the string, we count +1 for ana and —1 for a b. If a string w starts and ends with a, then the count will 
be +1 after the leftmost symbol and —1 immediately before the rightmost one. Therefore, the count has 
to go through zero somewhere in the middle of the string, indicating that such a string must have the 
form 


wW = Wi W2, 
where both w, and w, are in L. This case can be taken care of by the production S — SS. 


Once we see the argument intuitively, we are ready to proceed more rigorously. Again we use 
induction. Assume that all w € L with w| < 2n can be derived with G. Take any w e L of length 2n + 
2. If w= aw _b, then w is in L, and |w,| = 2n. Therefore, by assumption, 


S Š Wy. 
Then 
: — i 
S > asb > aw1b = w 


is possible, and w can be derived with G. Obviously, similar arguments can be made if w = bwya. 


Ifw is not of this form, that is, if it starts and ends with the same symbol, then the counting 
argument tells us that it must have the form w = w,w , with w, and w, both in Z and of length less than 


or equal to 2n. Hence again we see that 


|j 


S=>SS wS = wW = W 


is possible. 


Since the inductive assumption is clearly satisfied for n = 1, we have a basis, and the claim is true 
for all n, completing our argument. 


Normally, a given language has many grammars that generate it. Even though these grammars are 
different, they are equivalent in some sense. We say that two grammars G, and G, are equivalent if 


they generate the same language, that is, if 
L (G1) = L (Go). 
As we will see later, it is not always easy to see if two grammars are equivalent. 


Example 1.14 


Consider the grammar G, = ({A, S}, {a, b}, S, P1), with P4 consisting of the productions 


S-— aAb|A, 
A — aAb|X. 


Here we introduce a convenient shorthand notation in which several production rules with the same 
left-hand sides are written on the same line, with alternative right-hand sides separated by |. In this 
notation S — aAb stands for the two productions S — aAb and S —> 2. 

This grammar is equivalent to the grammar G in Example 1.11. The equivalence is easy to prove 
by showing that 


L(G) = {a"b" :n>0}. 


We leave this as an exercise. 


Automata 


An automaton is an abstract model of a digital computer. As such, every automaton includes some 
essential features. It has a mechanism for reading input. It will be assumed that the input is a string 
over a given alphabet, written on an input file, which the automaton can read but not change. The 
input file is divided into cells, each of which can hold one symbol. The input mechanism can read the 
input file from left to right, one symbol at a time. The input mechanism can also detect the end of the 
input string (by sensing an end-of-file condition). The automaton can produce output of some form. It 
may have a temporary storage device, consisting of an unlimited number of cells, each capable of 
holding a single symbol from an alphabet (not necessarily the same one as the input alphabet). The 
automaton can read and change the contents of the storage cells. Finally, the automaton has a control 
unit, which can be in any one of a finite number of internal states, and which can change state in 
some defined manner. Figure 1.4 shows a schematic representation of a general automaton. 

An automaton is assumed to operate in a discrete timeframe. At any given time, the control unit is 
in some internal state, and the input mechanism is scanning a particular symbol on the input file. The 
internal state of the control unit at the next time step is determined by the next-state or transition 
function. This transition function gives the next state in terms of the current state, the current input 
symbol, and the information currently in the temporary storage. During the transition from one time 
interval to the next, output may be produced or the information in the temporary storage changed. The 
term configuration will be used to refer to a particular state of the control unit, input file, and 
temporary storage. The transition of the automaton from one configuration to the next will be called a 
move. 


Figure 1.4 


Input file 


Ptr ih 


Control unit 
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This general model covers all the automata we will discuss in this book. A finite-state control 
will be common to all specific cases, but differences will arise from the way in which the output can 
be produced and the nature of the temporary storage. As we will see, the nature of the temporary 
storage governs the power of different types of automata. 


For subsequent discussions, it will be necessary to distinguish between deterministic automata 
and nondeterministic automata. A deterministic automaton is one in which each move is uniquely 
determined by the current configuration. If we know the internal state, the input, and the contents of the 
temporary storage, we can predict the future behavior of the automaton exactly. In a nondeterministic 
automaton, this is not so. At each point, a nondeterministic automaton may have several possible 
moves, so we can only predict a set of possible actions. The relation between deterministic and 
nondeterministic automata of various types will play a significant role in our study. 


An automaton whose output response is limited to a simple “yes” or “no” is called an accepter. 
Presented with an input string, an accepter either accepts the string or rejects it. A more general 
automaton, capable of producing strings of symbols as output, is called a transducer. 


EXERCISES 


1. Use induction on n to show that |v"| = n |u| for all strings u and all n. 


2. The reverse of a string, introduced informally above, can be defined more precisely by the 
recursive rules 


R 
a = a, 


(wa) =aw , 
for alla <>, we >™*. Use this to prove that 
(uv)® = vfu” 


for all u, v € >”. 


3. Prove that (w*)* = w for all w e ¥*. 


4. LetL = {ab, aa, baa}. Which of the following strings are in L*: abaabaaabaa, aaaabaaaa, 
baaaaabaaaab, baaaaabaa? Which strings are in L4? 


5. Let} = {a, b} and L = {aa, bb}. Use set notation to describe L. 


6. Let L be any language on a non-empty alphabet. Show that L and L cannot both be finite. 
7. Are there languages for which L» = (LY? 
8. Prove that 

ERE A EE BEEF 


for all languages L, and L3. 


9. Show that (L*)* = L* for all languages. 
10. Prove or disprove the following claims. 
(a) (Li U L2 \" = L? U LF for all languages L, and L,. 


(b) (L*)* = (L*)* for all languages L. 


11. Find grammars for } = {a, b} that generate the sets of 
(a) all strings with exactly one a. 


(b) all strings with at least one a. 
(c) all strings with no more than three a’s. 
(d) all strings with at least three a’s. 


In each case, give convincing arguments that the grammar you give does indeed generate the 
indicated language. 


12. Give a simple description of the language generated by the grammar with productions 
S — aA, 
A — bS, 


S — À. 
13. What language does the grammar with these productions generate? 


S — Aa, 
A-B, 
B — Aa. 


14. Let $ = {a, b}. For each of the following languages, find a grammar that generates it. 
(a) L| = {a"b”" :n=0,m>nt}. 


(b) Ly = {a"b2” : n> 0}. 
(c) L3 = {abt : n> 1}. 
(d) Ly = {a"b" : n23). 
(e) L, Ly. 

® L U Ly. 

(g) Li.. 

(h) Li. 

(i) ir- Tù. 


*15. Find grammars for the following languages on >) = {a}. 
(a) L = {w : w| mod 3 = 0}. 


(b) L = {w : |w| mod 3 > 0}. 


(c) L= {w : |w| mod 3 F |w| mod 2}. 


(d) L= {w : |w| mod 3 > |w| mod 2}. 
16. Find a grammar that generates the language 
fa [ww pge {a,}*}. 
Give a complete justification for your answer. 


17. Give a verbal description of the language generated by 


S — aSb|bSala. 


18. Using the notation of Example 1.13, find grammars for the languages below. Assume )) = {a, b}. 
(a) L= {wi ng (w= ny (wW) +1}. 


(b) L= {w: ng (W) > ny (w)}. 


*() L= {w : ng (W) = 27, (w)}. 


(d) L= {we {a, b}*: |ng (w) = ny (w)| = 13. 


19. Repeat the previous exercise with È} = {a, b, c}. 
20. Complete the arguments in Example 1.14, showing that L (G,) does in fact generate the given 
language. 


21. Are the two grammars with respective productions 


S — aSb |ab| À, 


and 


5 —p aAbļ|ab, 


A — aAb|r 
equivalent? Assume that S is the start symbol in both cases. 


22. Show that the grammar G =({S}, {a, b}, S, P), with productions 


S — SS|SSS|aSb|bSal A, 
is equivalent to the grammar in Example 1.13. 


23. Show that the grammars 


S — aSb|bSa|SS|a 


and 


S — aSb|bSala 


are not equivalent. 


fie Some Applications* 


Although we stress the abstract and mathematical nature of formal languages and automata, it turns out 
that these concepts have widespread applications in computer science and are, in fact, a common 
theme that connects many specialty areas. In this section, we present some simple examples to give 
the reader some assurance that what we study here is not just a collection of abstractions, but is 
something that helps us understand many important, real problems. 


Formal languages and grammars are used widely in connection with programming languages. In 
most of our programming, we work with a more or less intuitive understanding of the language in 
which we write. Occasionally though, when using an unfamiliar feature, we may need to refer to 
precise descriptions such as the syntax diagrams found in most programming texts. If we write a 
compiler, or if we wish to reason about the correctness of a program, a precise description of the 
language is needed at almost every step. Among the ways in which programming languages can be 
defined precisely, grammars are perhaps the most widely used. 


The grammars that describe a typical language like Pascal or C are very extensive. For an 
example, let us take a smaller language that is part of a larger one. 


Example 1.15 


The rules for variable identifiers in C are 
1. An identifier is a sequence of letters, digits, and underscores. 
2. An identifier must start with a letter oran underscore. 


3. Identifiers allow upper- and lower-case letters. 
Formally, these rules can be described by a grammar. 


< id >—< letter >< rest > | < undrscr >< rest > 

< rest >< letter >< rest > | < digit >< rest > | < undrsecr >< rest > |À 
< letter >— albj...|z|A|BI...|Z 

< digit >— 0]1|...|9 


< unarscr >— - 


In this grammar, the variables are <id>, </etter>, <digit>, <undrscr>, and <rest>. The letters, digits, 
and the underscore are terminals. A derivation of a0 is 


(id) => (letter) (re st) 
>a (re st) 
=> a (digit) (rest) 
=> al (rest) 


=a; 


The definition of programming languages through grammars is common and very useful. But there 
are alternatives that are often convenient. For example, we can describe a language by an accepter, 
taking every string that is accepted as part of the language. To talk about this in a precise way, we 
will need to give a more formal definition of an automaton. We will do this shortly; for the moment, 
let us proceed in a more intuitive way. 

An automaton can be represented by a graph in which the vertices give the internal states and the 
edges transitions. The labels on the edges show what happens (in terms of input and output) during the 
transition. For example, Figure 1.5 represents a transition from State 1 to State 2, whichis taken when 
the input symbol is a. With this intuitive picture in mind, let us look at another way of describing C 
identifiers. 


Figure 1.5 


5 | ay) 


Example 1.16 


Figure 1.6 is an automaton that accepts all legal C identifiers. Some interpretation is necessary. We 
assume that initially the automaton is in State 1; we indicate this by drawing an arrow (not originating 
in any vertex) to this state. As always, the string to be examined is read left to right, one character at 
each step. When the first symbol is a letter or an underscore, the automaton goes into State 2, after 
which the rest of the string is immaterial. State 2 therefore represents the “yes” state of the accepter. 
Conversely, if the first symbol is a digit, the automaton will go into State 3, the “no” state, and remain 
there. In our solution, we assume that no input other than letters, digits, or underscores is possible. 


Figure 1.6 


letter, digit, or undrser 


letter or undrser 


Compilers and other translators that convert a program from one language to another make 
extensive use of the ideas touched on in these examples. Programming languages can be defined 
precisely through grammars, as in Example 1.15, and both grammars and automata play a fundamental 
role in the decision processes by which a specific piece of code is accepted as satisfying the 
conditions of a programming language. The above example gives a first hint of how this is done; 
subsequent examples will expand on this observation. 

Transducers will be discussed briefly in Appendix A; the following example previews this 
subject. 


Example 1.17 


A binary adder is an integral part of any general-purpose computer. Such an adder takes two bit 
strings representing numbers and produces their sum as output. For simplicity, let us assume that we 
are dealing only with positive integers and that we use a representation in which 


T = aga, '' an 


stands for the integer 


nm 
ote) = J a;2'. 


i=0 


This is the usual binary representation in reverse. 

A serial adder processes two such numbers x = aga)...a,, and y = bob,...b,,, bit by bit, starting at 
the left end. Each bit addition creates a digit for the sum as well as a carry digit for the next higher 
position. A binary addition table (Figure 1.7) summarizes the process. 


Figure 1.7 


No carry 


A block diagram of the kind we saw when we first studied computers is given in Figure 1.8. It 
tells us that an adder is a box that accepts two bits and produces their sum bit and a possible carry. It 
describes what an adder does, but explains little about its internal workings. An automaton (now a 
transducer) can make this much more explicit. 


The input to the transducer are the bit pairs (a; b;), the output will be the sum bit d;. Again, we 
represent the automaton by a graph now labeling the edges (a;, b;)/d;. The carry from one step to the 


l 
next is remembered by the automaton via two internal states labeled “carry” and “no carry.” Initially, 
the transducer will be in state “no carry.” It will remain in this state until a bit pair (1, 1) is 
encountered; this will generate a carry that takes the automaton into the “carry” state. The presence of 
a carry is then taken into account when the next bit pair is read. A complete picture of a serial adder 
is given in Figure 1.9. Follow this through with a few examples to convince yourself that it works 
correctly. 


As this example indicates, the automaton serves as a bridge between the very high-level, 
functional description of a circuit and its logical implementation through transistors, gates, and flip- 
flops. The automaton clearly shows the decision logic, yet it is formal enough to lend itself to precise 
mathematical manipulation. For this reason, digital design methods rely heavily on concepts from 
automata theory. 


Figure 1.8 


— Sum bit d; 


b — Serial adder 


Carry 


Figure 1.9 


(0, 1)/1 (1, 0)/0 
(0, 0/0 (1, 0)/1 (0, 1/0 (1, 1)/1 


(0, 0)/1 


EXERCISES 


. Give a grammar for the set of integer numbers in C. 
. Design an accepter for integers in C. 


. Give a grammar that generates all real constants in C. 


>» UO N m 


. Suppose that a certain programming language permits only identifiers that begin with a letter, 
contain at least one but no more than three digits, and can have any number of letters. Give a 
grammar and an accepter for such a set of identifiers. 


5. Modify the grammar in Example 1.15 so that the identifiers satisfy the following rules: 
(a) C rules, except that an underscore cannot be the leftmost symbol. 
(b) C rules, except that there can be at most one underscore. 


(c) Crules, except that an underscore cannot be followed by a digit. 


6. Find a grammar for a certain type of scientific notation for real numbers on which the following 
rules hold: 


(a) The number can be preceded by a + or — sign, or the sign may be absent. 


(b) Numeric values must be of the forma.b, b...b,, where b; is any digit, but a must be a 
nonzero digit. 


(c) The number may be followed by an exponent field of the form e + yy or e — yy, where y can 
be any digit. 


7. In the Roman number system, numbers are represented by strings on the alphabet {M,D, C, L, X, V, 
I}. Design an accepter that accepts such strings only if they are properly formed Roman numbers. 
For simplicity, replace the “subtraction” convention in which the number nine is represented by 


IX with an addition equivalent that uses VII instead. 


8. We assumed that an automaton works in a framework of discrete time steps, but this aspect has 
little influence on our subsequent discussion. In digital design, however, the time element assumes 
considerable significance. 


In order to synchronize signals arriving from different parts of the computer, delay circuitry is 
needed. A unit-delay transducer is one that simply reproduces the input (viewed as a continual 
stream of symbols) one time unit later. Specifically, if the transducer reads as input a symbol a 
at time f, it will reproduce that symbol as output at time ¢ + 1. At time ¢ = 0, the transducer 
outputs nothing. We indicate this by saying that the transducer translates input ajaz... into output 
ajaz... 


Draw a graph showing how such a unit-delay transducer might be designed for È = {a, b}. 


9. An n-unit delay transducer is one that reproduces the input n time units later; that is, the input 
aja...is translated into à”a;az..., meaning again that the transducer produces no output for the 
first n time slots. 

(a) Construct a two-unit delay transducer on = {a, b}. 


(b) Show that an n-unit delay transducer must have at least |} |” states. 


10. The two's complement of a binary string, representing a positive integer, is formed by first 
complementing each bit, then adding one to the lowest-order bit. Design a transducer for 
translating bit strings into their two's complement, assuming that the binary number is represented 
as in Example 1.17, with lower-order bits at the left of the string. 


11. Design a transducer to convert a binary string into octal. For example, the bit string 001101110 
should produce the output 156. 


12. Let a,a,...be an input bit string. Design a transducer that computes the parity of every substring 
of three bits. Specifically, the transducer should produce output 


mi = T?) = U 


mi = (ai—2 + aj;-1 + ai) mod 2, i = 3,4,.... 
For example, the input 110111 should produce 000001. 


13. Design a transducer that accepts bit strings a,a,a3...and computes the binary value of each set of 
three consecutive bits modulo five. More specifically, the transducer should produce m,, mz, m3, 
..., Where 


1 = T> =U 


mi = (ai—2 + a;-1 + ai) mod 2,i = 3,4,.... 


14. Digital computers normally represent all information by bit strings, using some type of encoding. 
For example, character information can be encoded using the well-known ASCII system. 


For this exercise, consider the two alphabets {a, b, c, d} and {0, 1}, respectively, and an 
encoding from the first to the second, defined by a — 00, b — 01, c — 10, d — 11. Construct a 
transducer for decoding strings on {0, 1} into the original message. For example, the input 
010011 should generate as output bad. 


15. Let x and y be two positive binary numbers. Design a transducer whose output is max (x, y). 


Chapter 2 


Finite 
Automata 


ur introduction in the first chapter to the basic concepts of computation, particularly the 
discussion of automata, is brief and informal. At this point, we have only a general 
understanding of what an automaton is and how it can be represented by a graph. To 
progress, we must be more precise, provide formal definitions, and start to develop 
rigorous results. We begin with finite accepters, which are a simple, special case of the 
general scheme introduced in the last chapter. This type of automaton is characterized by having no 
temporary storage. Since an input file cannot be rewritten, a finite automaton is severely limited in its 
capacity to “remember” things during the computation. A finite amount of information can be retained 
in the control unit by placing the unit into a specific state. But since the number of such states is finite, 
a finite automaton can only deal with situations in which the information to be stored at any time is 
strictly bounded. The automaton in Example 1.16 is an instance of a finite accepter. 


2.1 Deterministic Finite Accepters 


The first type of automaton we study in detail are finite accepters that are deterministic in their 
operation. We start with a precise formal definition of deterministic accepters. 


Deterministic Accepters and Transition Graphs 
In common with all automata, a deterministic accepter has internal states, rules for transitions from 


one state to another, some input, and ways of making decisions. All of these are incorporated in the 
following definition. 


Definition 2.1 


A deterministic finite accepter or dfa is defined by the quintuple 
M= (Q,2,5,9, F), 
where 


QO is a finite set of internal states, 


x is a finite set of symbols called the input alphabet, 
6:0 xX — Q is a total function called the transition function, 


do E€ Q is the initial state, 


F GQ is a set of final states. 


A deterministic finite accepter operates in the following manner. At the initial time, it is assumed 
to be in the initial state go, with its input mechanism on the leftmost symbol of the input string. During 


each move of the automaton, the input mechanism advances one position to the right, so each move 
consumes one input symbol. When the end of the string is reached, the string is accepted if the 
automaton is in one of its final states. Otherwise the string is rejected. The input mechanism can move 
only from left to right and reads exactly one symbol on each step. The transitions from one internal 
state to another are governed by the transition function 6. For example, if 


ò (Go, 4) = 4, 


then if the dfa is in state gp and the current input symbol is a, the dfa will go into state q4. 


In discussing automata, it is essential to have a clear and intuitive picture to work with. To 
visualize and represent finite automata, we use transition graphs, in which the vertices represent 
states and the edges represent transitions. The labels on the vertices are the names of the states, while 
the labels on the edges are the current values of the input symbol. For example, if qọ and gq, are 


internal states of some dfa M, then the graph associated with M will have one vertex labeled gq and 
another labeled g,. An edge (g,q,) labeled a represents the transition 0(q¢o,a) = q,. The initial state 


will be identified by an incoming unlabeled arrow not originating at any vertex. Final states are 
drawn with a double circle. 


More formally, if M = (Q, %,6,q¢9,/) is a deterministic finite accepter, then its associated 
transition graph Gy has exactly |Q| vertices, each one labeled with a different q; € Q. For every 
transition rule ò (q;,a) = q;, the graph has an edge (q;,q;) labeled a. The vertex associated with qo is 
called the initial vertex, while those labeled with q; € F are the final vertices. It is a trivial matter to 
convert from the (Q, 2,6,g,/’) definition of a dfa to its transition graph representation and vice versa. 


Example 2.1 


The graph in Figure 2.1 represents the dfa 
M =( {9091923 > {0, 1} 0,40; {q} ); 


where 6 is given by 


å (qo. 0) = qo. ô (qo, 1 ) = Ji, 
å (qy,9) = qo. ó (qi S. )= qa, 


ô (qo,0) = go. ô(q2,1) = q1. 


This dfa accepts the string 01. Starting in state go, the symbol 0 is read first. Looking at the edges of 
the graph, we see that the automaton remains in state gp. Next, the 1 is read and the automaton goes 
into state q}. We are now at the end of the string and, at the same time, in a final state g,. Therefore, 


the string 01 is accepted. The dfa does not accept the string 00, since after reading two consecutive 
0’s, it will be in state gp. By similar reasoning, we see that the automaton will accept the strings 101, 


0111, and11001, but not 100 or 1100. 
Figure 2.1 


It is convenient to introduce the extended transition function 6* :Q x >}}* — Q. The second 
argument of 5” is a string, rather than a single symbol, and its value gives the state the automaton will 
be in after reading that string. For example, if 


6(90,4) = q1 
and 
6(9g1,b) = qo, 


then 
5 (qoab) = q2. 


Formally, we can define 8“ recursively by 


ô“ (q, A) = q, (2.1) 


5° (q, wa) = ó (ô* (q, w) a), (2.2) 


for allg € Q, w € ÈX“, a e Ł. To see why this is appropriate, let us apply these definitions to the 
simple case above. First, we use (2.2) to get 


0” (qo, ab) = 6 (8* (qo,a) b). (2.3) 


But 


é* (qo. a) =é (d* (qo, A) .a) 
= 6(qo,a) 


s 
Substituting this into (2.3), we get 


é* (qo, ab) =% (qi; b) = 42, 


as expected. 


Languages and Dfa's 


Having made a precise definition of an accepter, we are now ready to define formally what we mean 
by an associated language. The association is obvious: The language is the set of all the strings 
accepted by the automaton. 


Definition 2.2 


The language accepted by a dfa M = (Q, 2,6, qọ,F) is the set of all strings on X accepted by M. In 
formal notation, 


L(M) = {wE D": 0 (qo, w) EF}. 


Note that we require that 5, and consequently 5", be total functions. At each step, a unique move is 
defined, so that we are justified in calling such an automaton deterministic. A dfa will process every 
string in £“ and either accept it or not accept it. Nonacceptance means that the dfa stops in a nonfinal 
state, so that 


L(M)={we€exX*:6*(qgo,w) é F}. 
Example 2.2 


Consider the dfa in Figure 2.2. 


In drawing Figure 2.2 we allowed the use of two labels on a single edge. Such multiply labeled 
edges are shorthand for two or more distinct transitions: The transition is taken whenever the input 
symbol matches any of the edge labels. 


The automaton in Figure 2.2 remains in its initial state gp until the first b is encountered. If this is 
also the last symbol of the input, then the string is accepted. If not, the dfa goes into state q, from 
which it can never escape. The state q) is a trap state. We see clearly from the graph that the 


automaton accepts all strings consisting of an arbitrary number of a's, followed by a single b. All 
other input strings are rejected. In set notation, the language accepted by the automaton is 


L={a"b:n>0}. 


Figure 2.2 


a 


69.8 


These examples show how convenient transition graphs are for working with finite automata. 
While it is possible to base all arguments strictly on the properties of the transition function and its 
extension through (2.1) and (2.2), the results are hard to follow. In our discussion, we use graphs, 
which are more intuitive, as far as possible. To do so, we must, of course, have some assurance that 
we are not misled by the representation and that arguments based on graphs are as valid as those that 
use the formal properties of ò. The following preliminary result gives us this assurance. 


Theorem 2.1 


Let M =(Q,%,0,90,/) be a deterministic finite accepter, and let Gy be its associated transition graph. 
Then for every q; q; E Q, and w € Et, 8"(¢,w) = q; if and only if there is in Gy a walk with label w 
from q; to q;. 


Proof: This claim is fairly obvious from an examination of such simple cases as Example 2.1. It can 
be proved rigorously using an induction on the length of w. Assume that the claim is true for all 
strings v with |v|< n. Consider then any w of length n + 1 and write it as 


w= va 


Suppose now that 8*(q;,v) = q;,. Since |v=n, there must be a walk in Gy labeled v from q; to qy. But if 


5°(q,w) = q; then M must have a transition 6 (qg;,a) =q;, so that by construction Gy, has an edge 
(9;..q;) with label a. Thus, there is a walk in Gy, labeled va= w between q; and q;. Since the result is 


obviously true for n = 1, we can claim by induction that, for every w € X", implies that there is a walk 
in Gy from q; to qj labeled w. 
8° (quw) = qj (2.4) 
The argument can be turned around in a straightforward way to show that the existence of such a 
path implies (2. 4), thus completing the proof. = 


Again, the result of the theorem is so intuitively obvious that a formal proof seems unnecessary. 
We went through the details for two reasons. The first is that it is a simple, yet typical example of an 
inductive proof in connection with automata. The second is that the result will be used over and over, 


so stating and proving it as a theorem lets us argue quite confidently using graphs. This makes our 
examples and proofs more transparent than they would be if we used the properties of 5°. 


While graphs are convenient for visualizing automata, other representations are also useful. For 
example, we can represent the function 6 as a table. The table in Figure 2.3 is equivalent to Figure 
2.2. Here the row label is the current state, while the column label represents the current input 
symbol. The entry in the table defines the next state. 


It is apparent from this example that a dfa can easily be implemented as a computer program; for 
example, as a simple table-lookup or as a sequence of if statements. The best implementation or 
representation depends on the specific application. Transition graphs are very convenient for the 
kinds of arguments we want to make here, so we use them in most of our discussions. 


In constructing automata for languages defined informally, we employ reasoning similar to that for 
programming in higher-level languages. But the programming of a dfa is tedious and sometimes 
conceptually complicated by the fact that such an automaton has few powerful features. 


Figure 2.3 


Example 2.3 


Find a deterministic finite accepter that recognizes the set of all strings on Y= {a,b} starting with the 
prefix ab. 


The only issue here is the first two symbols in the string; after they have been read, no further 
decisions are needed. Still, the automaton has to process the whole string before its decision is made. 
We can therefore solve the problem with an automaton that has four states; an initial state, two states 
for recognizing ab ending in a final trap state, and one nonfinal trap state. If the first symbol is ana 
and the second is a b, the automaton goes to the final trap state, where it will stay since the rest of the 
input does not matter. On the other hand, if the first symbol is not an a or the second one is not a b, the 
automaton enters the nonfinal trap state. The simple solution is shown in Figure 2.4. 


Figure 2.4 


a b 


Example 2.4 


Find a dfa that accepts all the strings on {0,1}, except those containing the substring 001. 


In deciding whether the substring 001 has occurred, we need to know not only the current input 
symbol, but we also need to remember whether or not it has been preceded by one or two 0’s. We can 
keep track of this by putting the automaton into specific states and labeling them accordingly. Like 
variable names in a programming language, state names are arbitrary and can be chosen for mnemonic 
reasons. For example, the state in which two 0’s were the immediately preceding symbols can be 
labeled simply 00. 


If the string starts with 001, then it must be rejected. This implies that there must be a path labeled 
001 from the initial state to a nonfinal state. For convenience, this nonfinal state is labeled 001. This 
state must be a trap state, because later symbols do not matter. All other states are accepting states. 


This gives us the basic structure of the solution, but we still must add provisions for the substring 
001 occurring in the middle of the input. We must define Q and ô so that whatever we need to make 
the correct decision is remembered by the automaton. In this case, when a symbol is read, we need to 
know some part of the string to the left, for example, whether or not the two previous symbols were 
00. If we label the states with the relevant symbols, it is very easy to see what the transitions must be. 
For example, 


8(00, 0) = 00 


because this situation arises only if there are three consecutive 0’s. We are only interested in the last 
two, a fact we remember by keeping the dfa in the state 00. A complete solution is shown in Figure 
2.5. We see from this example how useful mnemonic labels on the states are for keeping track of 
things. Trace a few strings, such as 100100 and 1010100, to see that the solution is indeed correct. 


Figure 2.5 


0,1 


Regular Languages 


Every finite automaton accepts some language. If we consider all possible finite automata, we get a 
set of languages associated with them. We will call such a set of languages a family. The family of 
languages that is accepted by deterministic finite accepters is quite limited. The structure and 
properties of the languages in this family will become clearer as our study proceeds; for the moment 
we will simply attach a name to this family. 


Definition 2.3 


A language L is called regular if and only if there exists some deterministic finite accepter M such 
that 


L= L(M). 


Example 2.5 


Show that the language is regular. 
L= {awa: w€ {a,b} } 


To show that this or any other language is regular, all we have to do is find a dfa for it. The 
construction of a dfa for this language is similar to Example 2.3, but a little more complicated. What 
this dfa must do is check whether a string begins and ends with an a; what is between is immaterial. 
The solution is complicated by the fact that there is no explicit way of testing the end of the string. 
This difficulty is overcome by simply putting the dfa into a final state whenever the second a is 
encountered. If this is not the end of the string, and another b is found, it will take the dfa out of the 
final state. Scanning continues in this way, each a taking the automaton back to its final state. The 
complete solution is shown in Figure 2.6. Again, 


Figure 2.6 


trace a few examples to see why this works. After one or two tests, it will be obvious that the dfa 
accepts a string if and only if it begins and ends with ana. Since we have constructed a dfa for the 
language, we can claim that, by definition, the language is regular. 


Example 2.6 


Let L be the language in Example 2.5. Show that Z? is regular. Again we show that the language is 
regular by constructing a dfa for it. We can write an explicit expression for L?, namely, 


L? = {awaawa : w1, We € {a,b}" \ : 


Therefore, we need a dfa that recognizes two consecutive strings of essentially the same form (but not 
necessarily identical in value). The diagram in Figure 2.6 can be used as a starting point, but the 
vertex q, has to be modified. This state can no longer be final since, at this point, we must start to 


look for a second substring of the form awa. To recognize the second substring, we replicate the 
states of the first part (with new names), with q} as the beginning of the second part. Since the 


complete string can be broken into its constituent parts wherever aa occurs, we let the first 
occurrence of two consecutive a’s be the trigger that gets the automaton into its second part. We can 


do this by making 6(q3,a)= q4. The complete solution is in Figure 2.7. This dfa accepts L*, which is 
therefore regular. 


Figure 2.7 


The last example suggests the conjecture that if a language L is regular, so are L7,L°,.... We will 
see later that this is indeed correct. 


EXERCISES 


1. Which of the strings 0001, 01001, 0000110 are accepted by the dfa in Figure 2.1? 


2. For X= {a,b}, onstruct dfa's that accept the sets consisting of 
(a) all strings with exactly one a, 
(b) all strings with at least one a, 
(c) all strings with no more than three a's, 
(d) all strings with at least one a and exactly two b’s, 
(e) all the strings with exactly two a’s and more than two b’s. 
3. Show that if we change Figure 2.6, making q; a nonfinal state and making qo, q1,q> final states, the 
resulting dfa accepts È 
4. Generalize the observation in the previous exercise. Specifically, show that if M= Q,2,0,q9,/) and 
M = (Q, £, ô, qo,Q — F) are two dfa's, then L (M) =} (1) 
5. Give dfa's for the languages 
(a)L= {ab-wh*: we {a,b}", 
(b)Z= {ab"a™: n > 2,m >3}, 
(c)L= {w,abw,: w; € {a,b} w> € {a,b} "}, 
(d)L= {ba": n > 1,n# 5}. 
6. With È = {a,b}, give a dfa for L= w,aw,: |w È 3, [wy 5}. 
7. Find dfa's for the following languages on È = {a,b}. 
(a) L= {w: |w| mod 3 = 0}. 
(b) L= {w: |w| mod 5 4 0}. 
(c) L= {w: n (w) mod 3 > 1}. 
(d) L= {w: n (w) mod 3 >n,(w)mod 3}. 
(e) L= {w :(n,(w) — n,(w)) mod 3 > 0}. 
(f) L= {w :(n,(w)+2n,(w)) mod 3 < 2}. 
(g) L= {w: w| mod 3 = 0, |w| #6}. 


* 8. A run in a string is a substring of length at least two, as long as possible and consisting entirely 
of the same symbol. For instance, the string abbbaab contains a run of b's of length three and a run 
of a's of length two. Find dfa's for the following languages on {a,b}. 


(a) L= {w: w contains no runs of length less than four}. 
(b) L= {w: every run of a’s has length either two or three}. 


(c) L= {w: there are at most two runs of a’s of length three}. 


(d) L= {w: there are exactly two runs of a’s of length 3}. 


9. Consider the set of strings on {0,1} defined by the requirements below. For each, construct an 
accepting dfa. 


(a) Every 00 is followed immediately by a 1. For example, the strings 101, 0010, 0010011001 
are in the language, but 0001 and 00100 are not. 


(b) All strings containing 00 but not 000. 
(c) The leftmost symbol differs from the rightmost one. 


(d) Every substring of four symbols has at most two 0’s. For example, 001110 and 011001 are 
in the language, but 10010 is not since one of its substrings, 0010, contains three zeros. 


(e) All strings of length five or more in which the fourth symbol from the right end is different 
from the leftmost symbol. 


(f) All strings in which the leftmost two symbols and the rightmost two symbols are identical. 


(g) All strings of length four or greater in which the leftmost three symbols are the same, but 
different from the rightmost symbol. 


* 10. Construct a dfa that accepts strings on {0,1} if and only if the value of the string, interpreted as 
a binary representation of an integer, is zero modulo five. For example, 0101 and 1111, 
representing the integers 5 and 15, respectively, are to be accepted. 


11 


12 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


Show that the language L= {vwv: v, w€ {a,b}", |vE 2} is regular. 

Show that L= {a”: n >4} is regular. 

Show that the language L= {a”: n > 0,n # 4} is regular. 

Show that the language L= {a”: n is either a multiple of three or a multiple of 5} is regular. 
Show that the language L = {a”: n is a multiple of three, but not a multiple of 5} is regular. 
Show that the set of all real numbers in C is a regular language. 

Show that if L is regular, so is L - {A}. 

Show that if Z is regular, so is L U {a}, forall a € È. 


Use (2.1) and (2.2) to show that for all w,v € =". 


ô* (q, wv) = ô (8 (q, w), v) 
Let L be the language accepted by the automaton in Figure 2.2. Find a dfa that accepts L7. 


Let L be the language accepted by the automaton in Figure 2.2. Find a dfa for the language L? — 


L. 


22. Let L be the language in Example 2.5. Show that L* is regular. 


23. Let Gy be the transition graph for some dfa M. Prove the following. 


(a) If LZ (M) is infinite, then Gy must have at least one cycle for which there is a path from the 


initial vertex to some vertex in the cycle and a path from some vertex in the cycle to some final 
vertex. 


(b) If L (M) is finite, then no such cycle exists. 


24. Let us define an operation truncate, which removes the rightmost symbol from any string. For 
example, truncate (aaaba) is aaab. The operation can be extended to languages by 


truncate (L)= {truncate(w):w € L} 


Show how, given a dfa for any regular language L, one can construct a dfa for truncate (L). 
From this, prove that if Z is a regular language not containing A, then truncate (L) is also regular. 


25. While the language accepted by a given dfa is unique, there are normally many dfa's that accept 
a language. Find a dfa with exactly six states that accepts the same language as the dfa in Figure 
2.4. 


26. Can you find a dfa with three states that accepts the language of the dfa in Figure 2.4? If not, can 
you give convincing arguments that no such dfa can exist? 


2.2 Nondeterministic Finite Accepters 


Finite accepters are more complicated if we allow them to act nondeterministically. Nondeterminism 
is a powerful but, at first sight, unusual idea. We normally think of computers as completely 
deterministic, and the element of choice seems out of place. Nevertheless, nondeterminism is a useful 
notion, as we shall see as we proceed. 


Definition of a Nondeterministic Accepter 
Nondeterminism means a choice of moves for an automaton. Rather than prescribing a unique move in 


each situation, we allow a set of possible moves. Formally, we achieve this by defining the transition 
function so that its range is a set of possible states. 


Definition 2.4 


A nondeterministic finite accepter or nfa is defined by the quintuple 


M=(Q,%,0,90.."), 


where QO,%,q0,/" are defined as for deterministic finite accepters, but 


Note that there are three major differences between this definition and the definition of a dfa. Ina 
nondeterministic accepter, the range of 8 is in the powerset 22, so that its value is not a single 
element of Q but a subset of it. This subset defines the set of all possible states that can be reached by 
the transition. If, for instance, the current state is q4, the symbol a is read, and 


6(41,4) = {40.92} : 


then either gp or q, could be the next state of the nfa. Also, we allow i as the second argument of 6. 
This means that the nfa can make a transition without consuming an input symbol. Although we still 
assume that the input mechanism can only travel to the right, it is possible that it is stationary on some 
moves. Finally, in an nfa, the set ò (q;,a) may be empty, meaning that there is no transition defined for 
this specific situation. 

Like dfa's, nondeterministic accepters can be represented by transition graphs. The vertices are 
determined by Q, while an edge (q;,g;) with label a is in the graph if and only if ò (q;;a) contains q;. 
Note that since a may be the empty string, there can be some edges labeled À. 

A string is accepted by an nfa if there is some sequence of possible moves that will put the 
machine in a final state at the end of the string. A string is rejected (that is, not accepted) only if there 
is no possible sequence of moves by which a final state can be reached. Nondeterminism can 
therefore be viewed as involving “intuitive” insight by which the best move can be chosen at every 
state (assuming that the nfa wants to accept every string). 


Example 2.7 


Consider the transition graph in Figure 2.8. It describes a nondeterministic accepter since there are 
two transitions labeled a out of qo. 


Figure 2.8 
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Example 2.8 


A nondeterministic automaton is shown in Figure 2.9. It is nondeterministic not only because several 
edges with the same label originate from one vertex, but also because it has a A-transition. Some 
transitions, such as 6 (q>,0), are unspecified in the graph. This is to be interpreted as a transition to 


the empty set, that is, 6 (q>,0) = Ø. The automaton accepts strings à, 1010, and 101010, but not 110 
and 10100. Note that for 10 there are two alternative walks, one leading to gp, the other to q). Even 
though q, is not a final state, the string is accepted because one walk leads to a final state. 


las 


Again, the transition function can be extended so its second argument is a string. We require of the 
extended transition function 5° that if then Q; is the set of all possible states the automaton may be in, 


Figure 2.9 


having started in state q; and having read w. A recursive definition of 5", analogous to (2.1) and (2.2), 


is possible, but not particularly enlightening. A more easily appreciated definition can be made 
through transition graphs. 


6° (qi, w) = Qj, 
Definition 2.5 


For an nfa, the extended transition function is defined so that 5° (q,w) contains q; if and only if there 
is a walk in the transition graph from q; to q; labeled w. This holds for all q;, q; E Q, and w € £ ý 


Example 2.9 


Figure 2.10 represents an nfa. It has several -transitions and some undefined transitions such as 
ô(q2,a). 

Suppose we want to find 8° (q,a) and 8° (q>,A). There is a walk labeled a involving two à- 
transitions from q; to itself. By using some of the A-edges twice, we see that there are also walks 
involving A-transitions to gq and q3. 

Thus, 


5°(q},4) = {90-9 142} . 


Figure 2.10 


À 


Since there is a A-edge between q, and gp, we have immediately that 5°(q,A)contains qo. Also, since 
any state can be reached from itself by making no move, and consequently using no input 
symbol,5°(q>,A)also contains q». Therefore, 


5° (q2, A) = {g0, 92} - 
Using as many -transitions as needed, you can also check that 


ô* (q2,aa) = {qo, q1; q2}- 


The definition of through labeled walks is somewhat informal, so it is useful to look at it a little 
more closely. Definition 2.5 is proper, since between any vertices v; and v; there is either a walk 


labeled w or there is not, indicating that 5° is completely defined. What is perhaps a little harder to 
see is that this definition can always be used to find 8” (q; w). 


In Section 1.1, we described an algorithm for finding all simple paths between two vertices. We 
cannot use this algorithm directly since, as Example 2.9 shows, a labeled walk is not always a simple 
path. We can modify the simple path algorithm, removing the restriction that no vertex or edge can be 
repeated. The new algorithm will now generate successively all walks of length one, length two, 
length three, and so on. 

There is still a difficulty. Given a w, how long can a walk labeled w be? This is not immediately 
obvious. In Example 2.9, the walk labeled a betweengq, and q, has length four. The problem is 
caused by the A-transitions, which lengthen the walk but do not contribute to the label. The situation is 
saved by this observation: If between two vertices v; and v, there is any walk labeled w, then there 
must be some walk labeled w of length no more than A + (1 + A) |w|, where A is the number of à- 
edges in the graph. The argument for this is: While A-edges may be repeated, there is always a walk 
in which every repeated A-edge is separated by an edge labeled with a nonempty symbol. Otherwise, 
the walk contains a cycle labeled A, which can be replaced by a simple path without changing the 
label of the walk. We leave a formal proof of this claim as an exercise. 


With this observation, we have a method for computing 5° (q;,w). We evaluate all walks of length 
at most A + (1 + A) |wioriginating at q; We select from them those that are labeled w. The terminating 
vertices of the selected walks are the elements of the set 5° (q;,W). 

As we have remarked, it is possible to define 5° in a recursive fashion as was done for the 
deterministic case. The result is unfortunately not very, transparent, and arguments with the extended 


transition function defined this way are hard to follow. We prefer to use the more intuitive and more 
manageable alternative in Definition 2.5. 


As for dfa's, the language accepted by an nfa is defined formally by the extended transition 


function. 
Definition 2.6 


The language L accepted by an nfa M = (Q,2,0, qo, F) is defined as the set of all strings accepted in 
the above sense. Formally, 


L(M)={we X: (q, w) NF # Ø}. 


In words, the language consists of all strings w for which there is a walk labeled w from the initial 
vertex of the transition graph to some final vertex. 


Example 2.10 


What is the language accepted by the automaton in Figure 2.9? It is easy to see from the graph that the 
only way the nfa can stop in a final state is if the input is either a repetition of the string 10 or the 
empty string. Therefore, the automaton accepts the language L= {(10) ” : n 20}. 


What happens when this automaton is presented with the string w = 110? After reading the prefix 
11, the automaton finds itself in state q), with the transition ò (q2, 0) undefined. We call such a 


situation a dead configuration, and we can visualize it as the automaton simply stopping without 
further action. But we must always keep in mind that such visualizations are imprecise and carry with 
them some danger of misinterpretation. What we can say precisely is that Thus, no final state can be 
reached by processing w = 110, and hence the string is not accepted. 


é* (qo. 110) = Ø, 


Why Nondeterminism? 


In reasoning about nondeterministic machines, we should be quite cautious in using intuitive notions. 
Intuition can easily lead us astray, and we must be able to give precise arguments to substantiate our 
conclusions. Nonde-terminism is a difficult concept. Digital computers are completely deterministic; 
their state at any time is uniquely predictable from the input and the initial state. Thus it is natural to 
ask why we study nondeterministic machines at all. We are trying to model real systems, so why 
include such nonmechanical features as choice? We can answer this question in various ways. 


Many deterministic algorithms require that one make a choice at some stage. A typical example is 
a game-playing program. Frequently, the best move is not known, but can be found using an 
exhaustive search with backtracking. When several alternatives are possible, we choose one and 
follow it until it becomes clear whether or not it was best. If not, we retreat to the last decision point 
and explore the other choices. A nondeterministic algorithm that can make the best choice would be 
able to solve the problem without backtracking, but a deterministic one can simulate nondeterminism 


with some extra work. For this reason, nondeterministic machines can serve as models of search-and- 
backtrack algorithms. 


Nondeterminism is sometimes helpful in solving problems easily. Look at the nfa in Figure 2.8. It 
is clear that there is a choice to be made. The first alternative leads to the acceptance of the string a°, 
while the second accepts all strings with an even number of a's. The language accepted by the nfa is 
{a>} U {a*": n >1}. While it is possible to find a dfa for this language, the nondeterminism is quite 
natural. The language is the union of two quite different sets, and the nondeterminism lets us decide at 
the outset which case we want. The deterministic solution is not as obviously related to the definition, 
and so is a little harder to find. As we go on, we will see other and more convincing examples of the 
usefulness of nondeterminism. 


In the same vein, nondeterminism is an effective mechanism for describing some complicated 
languages concisely. Notice that the definition of a grammar involves a nondeterministic element. In 
we can at any point choose either the first or the second production. This lets us specify many 
different strings using only two rules. 


S— aSb 


Finally, there is a technical reason for introducing nondeterminism. As we will see, certain 
theoretical results are more easily established for nfa's than for dfa's. Our next major result indicates 
that there is no essential difference between these two types of automata. Consequently, allowing 
nondeterminism often simplifies formal arguments without affecting the generality of the conclusion. 


EXERCISES 


1. Prove in detail the claim made in the previous section that if in a transition graph there is a walk 
labeled w, there must be some walk labeled w of length no more than A + (1 + A) |w). 


Find a dfa that accepts the language defined by the nfa in Figure 2.8. 
Find a dfa that accepts the complement of the language defined by the nfa in Figure 2.8. 
In Figure 2.9, find 5° (go,1011) and 5° (q,,01). 


In Figure 2.10, find 8° (qo, a)and 5° (q,,A). 


2; 

3. 

4. 

S. 

6. For the nfa in Figure 2.9, find 5°(qp, 1010) and 3° (q,,00). 

7. Design an nfa with no more than five states for the set {abab”: n >0}U{aba": n > 0}. 
8. Construct an nfa with three states that accepts the language {ab,abc}”. 

9. 


Do you think Exercise 8 can be solved with fewer than three states? 


10.(a) Find an nfa with three states that accepts the language 


L= fa” | oe {oak sm S 0k > o}. 


(b) Do you think the language in part (a) can be accepted by an nfa with fewer than three 
states? 


11. Find an nfa with four states for L= {a": n > O!U{b"a: n> 1}. 
12. Which of the strings 00, 01001, 10010, 000, 0000 are accepted by the following nfa? 


0,1 UA 


13. What is the complement of the language accepted by the nfa in Figure 2.10? 
14. Let L be the language accepted by the nfa in Figure 2.8. Find an nfa that accepts L U {a>}. 
15. Give a simple description of the language in Exercise 13. 


16. Find an nfa that accepts { a}” and is such that if in its transition graph a single edge is removed 
(without any other changes), the resulting automaton accepts {a}. 


17. Can Exercise 16 be solved using a dfa? If so, give the solution; if not, give convincing arguments 
for your conclusion. 


18. Consider the following modification of Definition 2.6. An nfa with multiple initialstates is 
defined by the quintuple 
M=(Q, X,0,90/), 


where Qo S Q is a Set of possible initial states. The language accepted by such an automaton is 
defined as 


L (M)= {w :6 *(qo,w) contains qs for any qo E€ Oo,g¢ € F}. 


Show that for every nfa with multiple initial states there exists an nfa with a single initial state 
that accepts the same language. 


19. Suppose that in Exercise 18 we made the restrictionQ) n F= Ø. Would this affect the 
conclusion? 


20. Use Definition 2.5 to show that for any nfa for all q €Q and all w, ve =". 


pES* (qw) 


21. An nfa in which (a) there are no A-transitions, and (b) for all q € Q and all a € È, 6 (g,a)contains 
at most one element, is sometimes called anincomplete dfa. This is reasonable since the 
conditions make it such that there is never any choice of moves. 


For È = {a,b}, convert the incomplete dfa below into a standard dfa. 


22. Let L be a regular language on some alphabet È, and let £; C È be a smaller alphabet. Consider 
L,, the subset of L whose elements are made up only of symbols from Xj, that is, Show that L} is 
also regular. 


Li= LND. 


2.3 Equivalence of Deterministic and Nondeterministic Finite 


Accepters 


We now come to a fundamental question. In what sense are dfa's and nfa's different? Obviously, there 
is a difference in their definition, but this does not imply that there is any essential distinction 
between them. To explore this question, we introduce the concept of equivalence between automata. 


Definition 2.7 


Two finite accepters, M, and M,, are said to be equivalent if that is, if they both accept the same 
language 


L(M,) = L(M)), 
As mentioned, there are generally many accepters for a given language, so any dfa or nfa has many 
equivalent accepters. 
Example 2.11 


The dfa shown in Figure 2.11 is equivalent to the nfa in Figure 2.9 since they both accept the language 
{(10)” : n 20}. 


Figure 2.11 


When we compare different classes of automata, the question invariably arises whether one class 
is more powerful than the other. By “more powerful” we mean that an automaton of one kind can 
achieve something that cannot be done by any automaton of the other kind. Let us look at this question 
for finite accepters. Since a dfa is in essence a restricted kind of nfa, it is clear that any language that 
is accepted by a dfa is also accepted by some nfa. But the converse is not so obvious. We have added 
nondeterminism, so it is at least conceivable that there is a language accepted by some nfa for which, 
in principle, we cannot find a dfa. But it turns out that this is not so. The classes of dfa's and nfa's are 
equally powerful: For every language accepted by some nfa there is a dfa that accepts the same 
language. 

This result is not obvious and certainly has to be demonstrated. The argument, like most 
arguments in this book, will be constructive. This means that we can actually give a way of 
converting any nfa into an equivalent dfa. The construction is not hard to understand; once the idea is 
clear it becomes the starting point for a rigorous argument. The rationale for the construction is the 
following. After an nfa has read a string w, we may not know exactly what state it will be in, but we 
can say that it must be in one state of a set of possible states, say {q;,q;,....q,}. An equivalent dfa 


after reading the same string must be in some definite state. How can we make these two situations 
correspond? The answer is a nice trick: Label the states of the dfa with a set of states in such a way 
that, after reading w, the equivalent dfa will be in a single state labeled {q;,q;,....q,}. Since for a set 


of |Q] states there are exactly 2'2! subsets, the corresponding dfa will have a finite number of states. 


Most of the work in this suggested construction lies in the analysis of the nfa to get the 
correspondence between possible states and inputs. Before getting to the formal description of this, 
let us illustrate it with a simple example. 


Example 2.12 


Convert the nfa in Figure 2.12 to an equivalent dfa. The nfa starts in state go, so the initial state of the 
dfa will be labeled {go}. After reading an a, the nfa can be in state g, or, by making a A-transition, in 
state q2. Therefore, the corresponding dfa must have a state labeled {g),q>} and a transition 


d({90} 54) = 91,92} - 


In state qo, the nfa has no specified transition when the input is b; therefore, 


ò ({qo5 b) = Ø. 


A state labeled Ø represents an impossible move for the nfa and, therefore, means nonacceptance of 
the string. Consequently, this state in the dfa must be a nonfinal trap state. 


Figure 2.12 


We have now introduced into the dfa the state {q1,q2}, so we need to find the transitions out of 


this state. Remember that this state of the dfa corresponds to two possible states of the nfa, so we 
must refer back to the nfa. If the nfa is in state qand reads an a, it can go to g,. Furthermore, from q, 


the nfa can make a (-transition to q>. If, for the same input, the nfa is in state g>, then there is no 
specified transition. Therefore, 


ò( {41:423 4) = 9192} - 
Similarly, 
5(491.92}.5) = {q0} 


At this point, every state has all transitions defined. The result, shown in Figure 2.13, is a dfa, 
equivalent to the nfa with which we started. The nfa in Figure 2.12 accepts any string for which 8° 
(qow) contains q4. For the corresponding dfa to accept every such w, any state whose label includes 


qı must be made a final state. 


Figure 2.13 


Theorem 2.2 


Let L be the language accepted by a nondeterministic finite accepter My= (Qy, 2,0 yoy). Then 


there exists a deterministic finite accepter Mp= (Qp, 2.5p,{9o}./p) such that 


L= L (Mp). 


Proof: Given My, we use the procedure nfa-to-dfa below to construct the transition graph Gp for 
Mp. To understand the construction, remember that Gp has to have certain properties. Every vertex 


must have exactly || outgoing edges, each labeled with a different element of X. During the 
construction, some of the edges may be missing, but the procedure continues until they are all there. 


procedure: nfa-to-dfa 


1. Create a graph Gp with vertex {gp}. Identify this vertex as the initial vertex. 


2. Repeat the following steps until no more edges are missing. 


Take any vertex {q;,q;,....4¢} OfGp that has no outgoing edge for some a € X Compute 


ON (Gi. @) ÒN qia). -ÒN (Qk. @)-[F 
On (hâ DA qy-@) E On Gk a VAF soos Gy I, 


create a vertex for Gp labeled {q/,q,,,....¢,,}1f it does not already exist. Add to Gp an edge from 
{qiq;j»- -q3 and label it with a. 


3. Every state of Gp whose label contains any qs € Fy is identified as a final vertex. 
4. If My accepts A, the vertex {qo} in Gp is also made a final vertex. 


It is clear that this procedure always terminates. Each pass through the loop in Step 2 adds an 
edge to Gp. But Gp has at most ’ 2!Qn/[d| edges, so that the loop eventually stops. To show that the 
construction also gives the correct answer, we argue by induction on the length of the input string. 

Assume that for every v of length less than or equal to n, the presence in Gy of a walk labeled v 
fromg, to q; implies that in Gp there is a walk labeled v from {gp} to a state Q; = {....q;,...}. 
Consider now any w= va and look at a walk in Gy labeled w from qq to q4. There must then be a walk 
labeled v from gp to q; and an edge (or a sequence of edges) labeled a from q; to g;. By the inductive 
assumption, in Gp there will be a walk labeled v from {go} to Q; But by construction, there will be 
an edge from Q; to some state whose label contains q; Thus, the inductive assumption holds for all 
strings of length n+ 1. As it is obviously true for n=1, it is true for all n. The result then is that 


whenever °x% contains a final state qs, so does the label ofi (%-w)-. To complete the proof, we 
reverse the argument to show that if the label ofh (%:w) contains qp, so must °\ (40: w)- m 
aa 


The arguments in this proof, although correct, are admittedly somewhat terse, showing only the 
major steps. We will follow this practice in the rest of the book, emphasizing the basicideas in a 
proof and omitting minor details, which you may want to fill in yourself. 


The construction in the previous proof is tedious but important. Let us do another example to make 
sure we understand all the steps. 


Example 2.13 


Convert the nfa in Figure 2.14 into an equivalent deterministic machine. 

Since (99,0) = {40.91}. we introduce the state {qo,q,} in Gp and add an edge labeled 0 between 
{qo}and {g0,q,}. In the same way, considering òy (go,1) = {q1} gives us the new state {q,} and an 
edge labeled 1 between it and {qo}. 


There are now a number of missing edges, so we continue, using the construction of Theorem 2.2. 
Looking at the state {g,g;}, we see that there is no outgoing edge labeled 0, so we compute 


dn (40:0) U ÖN (q1.0) = {40.41.42} - 


This gives us the new state {q,q¢),¢>}and the transition 
Op (190, q1} +9) = {40:41:92} - 


Figure 2.14 
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Then, using a=1, i= 0, j= 1, k= 2, 
Ow (M. On (gy. 1 Ov (yy. 1 1.42] 


makes it necessary to introduce yet another state { g),qg>}. At this point, we have the partially 
constructed automaton shown in Figure 2.15. Since there are still some missing edges, we continue 
until we obtain the complete solution in Figure 2.16. 


Figure 2.15 


Figure 2.16 


One important conclusion we can draw from Theorem 2.2 is that every language accepted by an 
nfa is regular. 


EXERCISES 


1. Use the construction of Theorem 2.2 to convert the nfa in Figure 2.10 to a dfa. Can you see a 
simpler answer more directly? 


2. Convert the nfa in Exercise 12, Section 2.2, into an equivalent dfa. 


3. Convert the following nfa into an equivalent dfa. 


ot 


4. Carefully complete the arguments in the proof of Theorem 2.2. Show in detail that if the label of 
° (do, WÌcontains gy, then °X (go; W)also contains qs: 


5. Is it true that for any nfa M = (O,2,5,q,/") the complement of L(M)is equal to the set {w € £“: 5° 
(qow) n F= Ø}? Ifso, prove it. If not, give a counterexample. 


6. Is it true that for every nfa M = (Q, 2,6,q0,/)the complement of L (M)is equal to the set {w € ae 
5°; (qow) E€ (Q — F) = Ø}? If so, prove it; if not, give a counterexample. 


7. Prove tha for every nfa with an arbitrary number of final states there is an equivalent nfa with only 
one final state. Can we make a similar claim for 


8. Find an nfa without A-transitions and with a single final state that accepts the set {a}U{b": n=1}. 


9. Let L be a regular language that does not contain à. Show that there exists an nfa without à- 
transitions and with a single final state that accepts L. 


10. Define a dfa with multiple initial states in an analogous way to the corresponding nfa in Exercise 
18, Section 2.2. Does there always exist an equivalent dfa with a single initial state? 


11. Prove that all finite languages are regular. 


12. Show that if L is regular, so is ZÊ. 


13. Give a simple verbal description of the language accepted by the dfa in Figure 2.16. Use this to 
find another dfa, equivalent to the given one, but with fewer states. 


“14. Let L be any language. Define even (w) as the string obtained by extracting from w the letters in 
even-numbered positions; that is, if 


W = @]Q703Q4..., 
then 
even (W)= a7aQy.... 
Corresponding to this, we can define a language 
even (L) = {even (w): we L}. 
Prove that if L is regular, so is even(L). 


15. From a language L we create a new language chop2 (L)by removing the two leftmost symbols of 
every string in L. Specifically, 


chop2(L) = {w: vw € L, with v= 2}. 


Show that if Z is regular, then chop2 (L) is also regular. 


2.4 Reduction of the Number of States in Finite Automata” 


Any dfa defines a unique language, but the converse is not true. For a given language, there are many 
dfa's that accept it. There may be a considerable difference in the number of states of such equivalent 
automata. In terms of the questions we have considered so far, all solutions are equally satisfactory, 
but if the results are to be applied in a practical setting, there may be reasons for preferring one over 
another. 


Example 2.14 


The two dfa's depicted in Figure 2.17(a) and 2.17(b) are equivalent, as a few test strings will quickly 


reveal. We notice some obviously unnecessary features of Figure 2.17(a). The state g; plays 
absolutely no role in the automaton since it can never be reached from the initial state gp. Such a state 


is inaccessible, and it can be removed (along with all transitions relating to it) without affecting the 
language accepted by the automaton. But even after the removal of qs, the first automaton has some 


redundant parts. The states reachable subsequent to the first move 6 (qọ,0) mirror those reachable 
from a first move 6 (go,1). The second automaton combines these two options. 


org 


(a) (b) 


Figure 2.17 


From a strictly theoretical point of view, there is little reason for preferring the automaton in 
Figure 2.17(b) over that in Figure 2.17(a). However, in terms of simplicity, the second alternative is 
clearly preferable. Representation of an automaton for the purpose of computation requires space 
proportional to the number of states. For storage efficiency, it is desirable to reduce the number of 
states as far as possible. We now describe an algorithm that accomplishes this. 


Definition 2.8 


Two states p and q of a dfa are called indistinguishable if 
5° (p, w) € F implies 5° (¢,w) € F, 
and 
ô* (p, w) ¢ F implies ô* (q.w) é F, 
for all w € £“. If, on the other hand, there exists some string w € x" such that 


ô* (p,w) € F and d* (q.w) é F, 
I 1 5 


or vice versa, then the states p and q are said to be distinguishable by a string w. 


Clearly, two states are either indistinguishable or distinguishable. In-distinguishability has the 
properties of an equivalence relation: Ifp andg are indistinguishable and ifg andr are also 
indistinguishable, then so are p and r, and all three states are indistinguishable. 


One method for reducing the states of a dfa is based on finding and combining indistinguishable 
states. We first describe a method for finding pairs of distinguishable states. 
procedure: mark 


1. Remove all inaccessible states. This can be done by enumerating all simple paths of the graph of 
the dfa starting at the initial state. Any state not part of some path is inaccessible. 


2. Consider all pairs of states (p, q). Ifp €F andg ¢ F or vice versa, mark the pair (p, q) as 
distinguishable. 


3. Repeat the following step until no previously unmarked pairs are marked. For all pairs (p, q) and 
all a € X, compute o(p, a)=p, and ò (q, a) = q,. If the pair (p,,g,) is marked as distinguishable, 
mark (p, q) as distinguishable. 

We claim that this procedure constitutes an algorithm for marking all distinguishable pairs. 


Theorem 2.3 


The procedure mark, applied to any dfa M =(Q, i,0,q¢9,/), terminates and determines all pairs of 
distinguishable states. 

Proof: Obviously, the procedure terminates, since there are only a finite number of pairs that can be 
marked. It is also easy to see that the states of any pair so marked are distinguishable. The only claim 
that requires elaboration is that the procedure finds all distinguishable pairs. 

Note first that states q; and qj are distinguishable with a string of length n if and only if there are 
transitions for some a € X, with q, and q; distinguishable by a string of length n — 1. We use this first 
to show that at the completion of the nth pass through the loop in step 3, all states distinguishable by 
strings of length n or less have been marked. In step 2, we mark all pairs indistinguishable by A, so 
we have a basis with n = 0 for an induction. We now assume that the claim is true for all i= 0,1,..., n 
— 1. By this inductive assumption, at the beginning of the mth pass through the loop, all states 
distinguishable by strings of length up to n — 1 hd. Because of (2.5) and (2.6) above, at the end of this 
pass, all states distinguishable by strings of length up to n will be marked. By induction then, we can 
claim that, for any n, at the completion of the nth pass, all pairs distinguishable by strings of length n 
or less have been marked. 


and 


(qia) = qu, (2.6) 


To show that this procedure marks all distinguishable states, assume that the loop terminates after 
n passes. This means that during the nth pass no new states were marked. From (2. 5) and (2.6), it 
then follows that there cannot be any states distinguishable by a string of lengthn, but not 
distinguishable by any shorter string. But if there are no states distinguishable only by strings of length 
n, there cannot be any states distinguishable only by strings of lengthn + 1, and so on. As a 
consequence, when the loop terminates, all distinguishable pairs have been marked. m 


The procedure mark can be implemented by partitioning the states into equivalence classes. 
Whenever two states are found to be distinguishable, they are immediately put into separate 
equivalence classes. 


Example 2.15 


Consider the automaton in Figure 2.18. 


In the second step of procedure mark we partition the state set into final and nonfinal states to get 
two equivalence classes {9 0,q1,¢3} and {q>,q4}. In the next step, when we compute 


6(G9,0) = q1 
and 
6(q1,0) = qo, 


we recognize that gy and g,are distinguishable, so we put them into different sets. So {q,q¢1,q3} 1S 
split into {gp} and {g}.q3}. Also, since 6(q2,0) = q3 and d(q4, 0) =q4, the class {q2,q4} is split into 
{q2} and {q4}. The rest of the computations show that no further splitting is needed. 


Figure 2.18 


Once the indistinguishability classes are found, the construction of the minimal dfa is 
straightforward. 


procedure: reduce 


Given a dfa M = ( Q,%,6, do, F), we construct a reduced dfa M (2. £, ô, q0, # ) as follows. 


1. Use procedure mark to generate the equivalence classes, say {q;q;»-- -4x3 as described. 


2. For each set {q),9;,--..9;$ Of such indistinguishable states, create a state labeled i j...k for M, 
3. For each transition rule of M of the form 


ô(q,.4)=qp, 


find the sets to which q, and q, belong. If q, € {4).4),---.4¢$ and qp © {44m --» In}, add to da 


rule 
S (ij7---k,a) =lm---n. 


4. The initial state fo is that state of M hose label includes the 0. 


5. F is the set of all the states whose label contains i such that q;€ F. 


Example 2.16 


Continuing with Example 2.15, we create the states in Figure 2.19. Since, for example, there is an 
edge labeled 0 from state 13 to state 2. The rest of the transitions are easily found, giving the minimal 


dfa in Figure 2.19. 
8(41,0) = qn, 


Figure 2.19 
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Theorem 2.4 


Given any dfa M, application of the procedure reduce yields another dfa M such that 
M. 
LML ). 


Furthermore, Me is minimal in the sense that there is no other dfa with a smaller number of states that 
also accepts L(M). 


Proof: There are two parts. The first is to show that the dfa created by reduce is equivalent to the 
original dfa. This is relatively easy and we can use inductive arguments similar to those used in 


establishing the equivalence of dfa's and nfa's. All we have to do is to show that 5° (q;,w) = q; if and 


only if the label of Ò* (qis w) is ofthe form.. j....We will leave this as an exercise. 


The second part, to show that ME ig minimal, is harder. Suppose M has states {PoP P2- -Pm 
with pọ the initial state. Assume that there is an equivalent dfa M4, with transition function 6, and 


initial state qọ, equivalent to al but with fewer states. Since there are no inaccessible states in a 
there must be distinct strings w,,w>,...,w,, such that 


O° (past) = yt m= 


But since M, has fewer states than M. there must be at least two of these strings, say w, and w;, such 
that 


ôi (qo; Wk) = ôi (qo. w1). 
Since p, and p; are distinguishable, there must be some string x such that)” (Po, wkx) = 0" (pk. X) is a 
final state, and ô (go: “iv) = ô* (P. ®) is a nonfinal state (or vice versa). In other words,w;x is 
M. : 
accepted by ` and wx is not. But note that 


ô (qo, Wk) = ôi ( ôi (do, Wk), T) 


ôi (ôi (qo, wi), £) 


= ôi (qo, wr). 


Thus, M, either accepts both w,x and ww or rejects both, contradicting the assumption that M. and 
Mare equivalent. This contradiction proves that M} cannot exist . m 


EXERCISES 


1. Minimize the number of states in the dfa in Figure 2.16. 
2. Find minimal dfa's for the following languages. In each case prove that the result 1s minimal. 
(a) L = {a” b™> :n>2,m>1}. 
(b) L = {a” b:n 20} U{b" a:n =1} 
(co) L={a" ma 0,n £3}. 
(d) L = {a":n#2 and n +4}. 
(e) L = {a":n mod 3 = 0}U{a”: n mod 5= 1}. 
3. Show that the automaton generated by procedure reduce is deterministic. 


4. Minimize the states in the dfa depicted in the following diagram. 


5. Show that if L is a nonempty language such that any w in L has length at least n, then any dfa 
accepting L must have at least n + 1 states. 


6. Prove or disprove the following conjecture. If M = (Q,%,6,q0,/’) is a minimal dfa for a regular 
language L, then M. (Q, Ł,ò.qọ0 — F) is a minimal dfa for L 

7. Show that indistinguishability is an equivalence relation but that distinguishability is not. 

8. Show the explicit steps of the suggested proof of the first part of Theorem 2.4, namely, that ME is 
equivalent to the original dfa. 

9. Prove the following: If the statesq, andg, are indistinguishable, and ifg, andgq, are 
distinguishable, then q, and q, must be distinguishable. 


* 10. Show that given a regular language L, its minimal dfa is unique within a simple relabeling of 
the states. 


Chapter 3 


Regular Languages and Regular Grammars 


ccording to our definition, a language is regular if there exists a finite accepter for it. 
Therefore, every regular language can be described by some dfa or some nfa. Such a 
A description can be very useful, for example, if we want to show the logic by which we 
decide if a given string is in a certain language. But in many instances, we need more 
concise ways of describing regular languages. In this chapter, we look at other ways of 
representing regular languages. These representations have important practical applications, a matter 
that is touched on in some of the examples and exercises. 


3.1 Regular Expressions 


One way of describing regular languages is via the notation of regular expressions. This notation 
involves a combination of strings of symbols from some alphabet Ł, parentheses, and the operators +, 
., and *. The simplest case is the language {a}, which will be denoted by the regular expression a 
Slightly more complicated is the language {a, b, c}, for which, using the + to denote union, we have 
the regular expression at+b+c. We use - for concatenation and * for star-closure in a similar way. The 
expression (a + (b-c))* stands for the star-closure of {a} U; {b}, that is, the language {A, a, bc, aa, 
abc, bca, bcbc, aaa, aabc,...}. 


Formal Definition of a Regular Expression 


We construct regular expressions from primitive constituents by repeatedly applying certain recursive 
rules. This is similar to the way we construct familiar arithmetic expressions. 


Definition 3.1 


Let È be a given alphabet. Then 
1. Ø,à and a € È are all regular expressions. These are called primitive regular expressions. 
2 Ifr, and r, are regular expressions, so are r+ r»,r4.r>,”1, and (r4). 


3. A string is a regular expression if and only if it can be derived from the primitive regular 
expressions by a finite number of applications of the rules in (2). 


Example 3.1 


For È = {a, b, c}, the string 
(at+btc)* .(c+@) 


is a regular expression, since it is constructed by application of the above rules. For example, if we 
take r7 = c and r2 = Ø, we find that c + Wand (c + Ø) are also regular expressions. Repeating this, we 
eventually generate the whole string. On the other hand, (a + b +) is not a regular expression, since 
there is no way it can be constructed from the primitive regular expressions. 


Languages Associated with Regular Expressions 


Regular expressions can be used to describe some simple languages. If7 is a regular expression, we 
will let L(r) denote the language associated with r. 


Definition 3.2 


The language L(r) denoted by any regular expression r is defined by the following rules. 
1. Ø is a regular expression denoting the empty set, 
2. à is a regular expression denoting {A}. 


3. For every a € È, a is a regular expression denoting {a}. 
Ifr, and rare regular expressions, then 


4. L (ri +r) =L(r UL (r2), 
5.L (r1: r2) =L (ri) UL (r3); 
6 L (CD) =L (r), 
TL CÙ = (L (rp) 
The last four rules of this definition are used to reduce L (r) to simpler components recursively; 


the first three are the termination conditions for this recursion. To see what language a given 
expression denotes, we apply these rules repeatedly. 


Example 3.2 


Exhibit the language L(a* - (a + b)) in set notation. 


L(a* -(a+6)) = L(a*)L(a+b) 
= (L (a))" (L (a) UL (b)) 
= {\,a,aa, aaa, ...} {a,b} 


= {a,aa,aaa,...,b,ab,aab,...}. 


There is one problem with rules (4) to (7) in Definition 3.2. They define a language precisely ifr, 
and r, are given, but there may be some ambiguity in breaking a complicated expression into parts. 
Consider, for example, the regular expression a . b+ c. We can consider this as being made up of r= 
a.b andr c. In this case, we find L (a . b +c) = {ab, c}. But there is nothing in Definition 3.2 to 
stop us from taking rų =a andr, =b + c. We now get a different result, L(a . b + c) = {ab, ac}. To 


overcome this, we could require that all expressions be fully parenthesized, but this gives 
cumbersome results. Instead, we use a convention familiar from mathematics and programming 
languages. We establish a set of precedence rules for evaluation in which star-closure precedes 
concatenation and concatenation precedes union. Also, the symbol for concatenation may be omitted, 
SO we can write r r) for r4.rz. 


With a little practice, we can see quickly what language a particular regular expression denotes. 


Example 3.3 


For È = {a,b}, the expression 
r=(atb)*(a+bb) 
is regular. It denotes the language 
L (r)= {a, bb, aa, abb, ba, bbb,...}. 


We can see this by considering the various parts of r. The first part, (a + b)*, stands for any string of 
a’s and b’s. The second part, (a + bb) represents either ana or a double b. Consequently, L(r) is the 
set of all strings on {a, b}, terminated by either ana or a bb. 


Example 3.4 


The expression 
r =(aa)* (bb)* b 
denotes the set of all strings with an even number of a’s followed by an odd number of b’s; that is, 


L (r) = {a7"b?"*!: n> 0, m>0} 


Going from an informal description or set notation to a regular expression tends to be a little 
harder. 


Example 3.5 


For È = {0, 1}, give a regular expression r such that 
L(r) = {w € x*: w has at least one pair of consecutive zeros}. 
One can arrive at an answer by reasoning something like this: Every string in L ( r) must contain 00 


somewhere, but what comes before and what goes after is completely arbitrary. An arbitrary string on 
{0,1} can be denoted by (0+1)*. Putting these observations together, we arrive at the solution 


r= (0 + 1)* 00(0 + 1)*. 


Example 3.6 


Find a regular expression for the language 
L= {we{0,1}*: w has no pair of consecutive zeros}. 


Even though this looks similar to Example 3.5, the answer is harder to construct. One helpful 
observation is that whenever a 0 occurs, it must be followed immediately by a 1. Such a substring 
may be preceded and followed by an arbitrary number of 1’s. This suggests that the answer involves 
the repetition of strings of the form 1...101...1, that is, the language denoted by the regular expression 
(1*011*)*. However, the answer is still incomplete, since the strings ending in 0 or consisting of all 
1’s are unaccounted for. After taking care of these special cases we arrive at the answer 


r= (1*011*)* (0 +2) +1* (O+A) 


If we reason slightly differently, we might come up with another answer. If we see L as the 
repetition of the strings 1 and 01, the shorter expression might be reached. Although the two 
expressions look different, both answers are correct, as they denote the same language. Generally, 
there are an unlimited number of regular expressions for any given language 


r= (14+01)* (0 +A) 


Note that this language is the complement of the language in Example 3.5. However, the regular 
expressions are not very similar and do not suggest clearly the close relationship between the 
languages. 


The last example introduces the notion of equivalence of regular expressions. We say the two 
regular expressions are equivalent if they denote the same language. One can derive a variety of rules 
for simplifying regular expressions (see Exercise 20 in the following exercise section), but since we 


have little need for such manipulations we will not pursue this. 


EXERCISES 


1. Find all strings in L((a + b) b (a + ab)*) of length less than four. 
2. Does the expression ((0 + 1) (0 + 1)*)* 00 (0 + 1)* denote the language in Example 3.5? 


3. Show thatr = (1 + 01)* (0 + 1*) also denotes the language in Example 3.6. Find two other 
equivalent expressions. 


4. Find a regular expression for the set {a”b™: n > 3,m is even}. 
5. Find a regular expression for the set {a”b”:( n + m) is even}. 
6. Give regular expressions for the following languages. 

(a) L= {"b": n=4,m < 3}. 

(b) L= {"b™: n<4,m < 3}. 

(c) The complement of L4. 

(d) The complement of L,. 


7. What languages do the expressions (0*)*and að denote? 
8. Give a simple verbal description of the language L ((aa)* b (aa)* + a (aa)* ba (aa)*). 


9. Give a regular expression for LË, where L is the language in Exercise 1. 
10. Give a regular expression for L = {a"b”: n > 1,m> 1,nm=> 3}. 
11. Find a regular expression for L = {ab"w: n > 3, we fa, b}*}. 
12. Find a regular expression for the complement of the language in Example 3.4. 
13. Find a regular expression for L= {vwv: v, w €{a, b}*, | =2}. 
14. Find a regular expression for L= {vwv: v, w €{a, b}*, |v|<3}. 
15. Find a regular expression for 
L= {we{0,1}*: w has exactly one pair of consecutive zeros} 
16. Give regular expressions for the following languages on È = {a, b, c}. 


(a) all strings containing exactly one a, 


(b) all strings containing no more than three a’s, 


(c) all strings that contain at least one occurrence of each symbol in È, 
(d) all strings that contain no run of a's of length greater than two, 


* (e) all strings in which all runs of a's have lengths that are multiples of three. 
17. Write regular expressions for the following languages on {0, 1}. 


(a) all strings ending in 01, 
(b) all strings not ending in 01, 
(c) all strings containing an even number of 0’s, 


(d) all strings having at least two occurrences of the substring 00. (Note that with the usual 
interpretation of a substring, 000 contains two such occurrences), 


(e) all strings with at most two occurrences of the substring 00, 


*(f) all strings not containing the substring 101. 
18. Find regular expressions for the following languages on {a, b}. 
(a) L= {w : |w| mod 3 = 0}. 
(b) L= {w : n, (w)mod 3 = 0}. 
(c) L= {w : n, (w)mod 5 > 0}. 
19. Repeat parts (a), (b), and (c) of Exercise 18, with È = { a, b, c}. 


20. Determine whether or not the following claims are true for all regular expressions 7, and r,. The 


symbol = stands for equivalence of regular expressions in the sense that both expressions denote 
the same language. 


21. Give a general method by which any regular expression r can be changed into r such that (Z 
EË = LUA) 
22. Prove rigorously that the expressions in Example 3.6 do indeed denote the specified language. 


23. For the case of a regular expression r that does not involve À or Ø, give a set of necessary and 
sufficient conditions that r must satisfy if L(r) is to be infinite. 


24. Formal languages can be used to describe a variety of two-dimensional figures. Chain-code 
languages are defined on the alphabet È = {u, d, r, L}, where these symbols stand for unit-length 
straight lines in the directions up, down, right, and left, respectively. An example of this notation 


is urdl, which stands for the square with sides of unit length. Draw pictures of the figures 
denoted by the expressions (rd)*, (urddru)*, and (ruldr)*. 


25. In Exercise 24, what are sufficient conditions on the expression so that the picture is a closed 
contour in the sense that the beginning and ending points are the same? Are these conditions also 
necessary? 


26. Find an nfa that accepts the language L (aa* (a + b)). 


27. Find a regular expression that denotes all bit strings whose value, when interpreted as a binary 
integer, is greater than or equal to 40. 


28. Find a regular expression for all bit strings, with leading bit 1, interpreted as a binary integer, 
with values not between 10 and 30. 


3.2 Connection between Regular Expressions and Regular 
Languages 


As the terminology suggests, the connection between regular languages and regular expressions is a 
close one. The two concepts are essentially the same; for every regular language there is a regular 
expression, and for every regular expression there is a regular language. We will show this in two 
parts. 


Regular Expressions Denote Regular Languages 


We first show that if r is a regular expression, then L(r) is a regular language. Our definition says that 
a language is regular if it is accepted by some dfa. Because of the equivalence of nfa's and dfa's, a 
language is also regular if it is accepted by some nfa. We now show that if we have any regular 
expressionr, we can construct an nfa that accepts L(r). The construction for this relies on the 
recursive definition for L(r). We first construct simple automata for parts (1), (2), and (3) of 
Definition 3.2, then show how they can be combined to implement the more complicated parts (4), 
(5), and (7). 


Theorem 3.1 


Letr be a regular expression. Then there exists some nondeterministic finite accepter that accepts 
L(r). Consequently, L(r) is a regular language. 

Proof: We begin with automata that accept the languages for the simple regular expressions Ø,à, and 
a e X. These are shown in Figure 3.1(a), (b), and (c), respectively. Assume now that we have 
automata M (rı) and M (r,) that accept languages denoted by regular expressions rų and r, 


respectively. We need not explicitly construct these automata, but may represent them schematically, 
as in Figure 3.2. In this scheme, the graph vertex at the left represents the initial state, the one on the 


right the final state. In Exercise 7, Section 2.3, we claim that for every nfa there is an equivalent one 
with a single final state, so we lose nothing in assuming that there is only one final state. With M (r,) 


and M (r2) represented in this way, we then construct automata for the regular expressions 7; +7, 
rir», and "i. The constructions are shown in Figures 3.3 to 3.5. As indicated in the drawings, the 


initial and final states of the constituent machines lose their status and are replaced by new initial and 
final states. By stringing together several such steps, we can build automata for arbitrary complex 
regular expressions. 


It should be clear from the interpretation of the graphs in Figures 3.3 to 3.5 that this construction 
works. To argue more rigorously, we can give a formal method for constructing the states and 
transitions of the combined machine from the states and transitions of the parts, then prove by 
induction on the number of operators that the construction yields an automaton that accepts the 
language denoted by any particular regular expression. We will not belabor this point, as it is 
reasonably obvious that the results are always correct. m 


Figure 3.1 
-© 


(a) nfa accepts Ø. 
(b) nfa accepts {A}. 
(c) nfa accepts {a}. 


Figure 3.2 


Schematic representation of an nfa accepting L(r). 


Figure 3.3 


Automaton for L(r; + 75). 


Figure 3.4 


Automaton for L(rr2). 

Figure 3.5 

Automaton for L("1). 
Example 3.7 


Find an nfa that accepts L(r), where 
r=(a + bb)* (ba* + 1) 


Automata for (a + bb) and (ba* + i), constructed directly from first principles, are given in Figure 
3.6. Putting these together using the construction in Theorem 3.1, we get the solution in Figure 3.7. 


Figure 3.6 


(a) M, accepts L(a + bb). 
(b) M, accepts L (ba* + À). 


Figure 3.7 


Automaton accepts L ((a + bb)* (ba* + i)). 


Regular Expressions for Regular Languages 


It is intuitively reasonable that the converse of Theorem 3.1 should hold, and that for every regular 
language, there should exist a corresponding regular expression. Since any regular language has an 
associated nfa and hence a transition graph, all we need to do is to find a regular expression capable 
of generating the labels of all the walks from qọ to any final state. This does not look too difficult but 


it is complicated by the existence of cycles that can often be traversed arbitrarily, in any order. This 
creates a bookkeeping problem that must be handled carefully. There are several ways to do this; one 
of the more intuitive approaches requires a side trip into what are called generalized transition 
graphs (GTG). Since this idea is used here in a limited way and plays no role in our further 
discussion, we will deal with it informally. 


A generalized transition graph is a transition graph whose edges are labeled with regular 
expressions; otherwise it is the same as the usual transition graph. The label of any walk from the 
initial state to a final state is the concatenation of several regular expressions, and hence itself a 
regular expression. The strings denoted by such regular expressions are a subset of the language 
accepted by the generalized transition graph, with the full language being the union of all such 
generated subsets. 


Example 3.8 


Figure 3.8 represents a generalized transition graph. The language accepted by it is L (a* + a* (a + b) 
c*), as should be clear from an inspection of the graph. The edge (q,, qo) labeled a is a cycle that can 


generate any number of a's, that is, it represents L (a*). We could have labeled this edge a* without 
changing the language accepted by the graph. 


Figure 3.8 


-O 


The graph of any nondeterministic finite accepter can be considered a generalized transition graph 
if the edge labels are interpreted properly. An edge labeled with a single symbol a is interpreted as 
an edge labeled with the expression a, while an edge labeled with multiple symbols a, b,...is 
interpreted as an edge labeled with the expression a + b + .... From this observation, it follows that 
for every regular language, there exists a generalized transition graph that accepts it. Conversely, 
every language accepted by a generalized transition graph is regular. Since the label of every walk in 
a generalized transition graph is a regular expression, this appears to be an immediate consequence of 
Theorem 3.1. However, there are some subtleties in the argument; we will not pursue them here, but 
refer the reader instead to Exercise 22, Section 4.3, for details. 


Equivalence for generalized transition graphs is defined in terms of the language accepted and the 
purpose of the next bit of discussion is to produce a sequence of increasingly simple GTGs. In this, 
we will find it convenient to work with complete GTGs. A complete GTG is a graph in which all 
edges are present. If a GTG, after conversion from an nfa, has some edges missing, we put them in 
and label them with Ø. A complete GTG with |V] vertices has exactly |V? edges. 


Example 3.9 


The GTG in Figure 3.9(a) is not complete. Figure 3.9(b) shows how it is completed. 
Figure 3.9 


Suppose now that we have the simple two-state complete GTG shown in Figure 3.10. By mentally 
tracing through this GTG you can convince yourself that the regular expression covers all possible 
paths and so 1s the correct regular expression associated with the graph. 


riretra + rarita)’ 1) 


When a GTG has more than two states, we can find an equivalent graph by removing one state at a 
time. We will illustrate this with an example before going to the general method. 


Figure 3.10 


Example 3.10 


Consider the complete GTG in Figure 3.11. To remove q2, we first intoduce some new edges. We 


create an edge from q, to g, and label it e + af*b, 
create an edge from q, to q3 and label ith + af *c, 
create an edge from q; to q; and label iti + df *b, 
create an edge from q3 to q3 and label it g + df *c. 


When this is done, we remove q, and all associated edges. This gives the GTG in Figure 3.12. You 


can explore the equivalence of the two GTGs by seeing how regular expressions such as af* c and e* 
ab are generated. 


Figure 3.11 


Figure 3.12 


For arbitrary GTGs we remove one state at a time until only two states are left. Then we apply 
Equation (3.1) to get the final regular expression. This tends to be a lengthy process, but it is 
staightforward as the following procedure shows. 


procedure: nfa-to-rex 
1. Start with an nfa with states go,q ,.....,d,, and a single final state, distinct from its initial state. 


2. Convert the nfa into a complete generalized transition graph. Let rij stand for the label of the edge 
from q; qj. 


3. If the GTG has only two states, with q; as its initial state and q; its final state, its associated 
regular expression is 


+ fe + + 
r = Tunely + TT Tey) 


4. If the GTG has three states, with initial state q; final state q;, and third state q}, introduce new 
edges, labeledfor p =i,j, q =i,7. When this is done, remove vertex q, and its associated edges. 


ra + T pT hT (3.3) 


5. If the GTG has four or more states, pick a state q, to be removed. Apply rule 4 for all pairs of 
states (q;.q;).i 4 k, j #k. At each step 


apply the simplifying rules 
r+@=r, 


rø = Ø, 


wherever possible. When this is done, remove state gk. 
6. Repeat Steps 3 to 5 until the correct regular expression is obtained. 


Example 3.11 


Find a regular expression for the language 
L= {w eta, b}* : n, (w) is even and n,(w) is odd}. 


An attempt to construct a regular expression directly from this description leads to all kinds of 
difficulties. On the other hand, finding an nfa for it is easy as long as we use vertex labeling 
effectively. We label the vertices with EE to denote an even number of a’s and b’s, with OE to denote 
an odd number ofa’s and an even number of b’s, and so on. With this we easily get the solution 
which, after conversion into a complete generalized transition graph, is in Figure 3.13. 

We now apply the conversion to a regular expression, using procedure nfa-to-rex. To remove the 
state OE, we apply Equation (3.3). The edge between EE and itself will have the label 


Trp = Ø +aS*a 


We continue in this manner until we get the GTG in Figure 3.14. Next, the state OO is removed, 
which gives Figure 3.15. Finally, we get the correct regular expression from Equation (3.2). 


Figure 3.13 


Figure 3.14 


Figure 3.15 


b+ ab(dd)*a 


The process of converting an nfa to a regular expression is mechanical but tedious. It leads to 
regular expressions that are complicated and of little practical use. The main reason for presenting 
this process is that it gives the idea for the proof of an important result. 


Theorem 3.2 


Let L be a regular language. Then there exists a regular expression r such that L = L(r). 


Proof: If Z is regular, there exists an nfa for it. We can assume without loss of generality, that this nfa 
has a single final state, distinct from its initial state. We convert this nfa to a complete generalized 
transition graph and apply the procedure nfa-to-rex to it. This yields the required regular expression 
r. 


While this can make the result plausible, a rigorous proof requires that we show that each step in 
the process generates an equivalent GTG. This is a technical matter we leave to the reader. m 


Regular Expressions for Describing Simple Patterns 


In Example 1.15 and in Exercise 16, Section 2.1, we explored the connection between finite 
accepters and some of the simpler constituents of programming languages, such as identifiers, or 
integers and real numbers. The relation between finite automata and regular expressions means that 
we can also use regular expressions as a way of describing these features. This is easy to see; for 
example, in many programming languages the set of integer constants is defined by the regular 
expression 


sdd*, 


where s stands for the sign, with possible values from { + , -,A}, and d stands for the digits 0 to 9. 
Integer constants are a simple case of what is sometimes called a “pattern,” a term that refers to a set 
of objects having some common properties. Pattern matching refers to assigning a given object to one 
of several categories. Often, the key to successful pattern matching is finding an effective way to 
describe the patterns. This is a complicated and extensive area of computer science to which we can 
only briefly allude. The following example is a simplified, but nevertheless instructive, 
demonstration of how the ideas we have talked about so far have been found useful in pattern 
matching. 


Example 3.12 


An application of pattern matching occurs in text editing. All text editors allow files to be scanned for 
the occurrence of a given string; most editors extend this to permit searching for patterns. For 
example, the vi editor in the UNIX operating system recognizes the command /aba*c/ as an 
instruction to search the file for the first occurrence of the string ab, followed by an arbitrary number 
of a’s, followed by a c. We see from this example the need for pattern-matching editors to work with 
regular expressions. 


A challenging task in such an application is to write an efficient program for recognizing string 
patterns. Searching a file for occurrences of a given string is a very simple programming exercise, but 
here the situation is more complicated. We have to deal with an unlimited number of arbitrarily 
complicated patterns; furthermore, the patterns are not fixed beforehand, but created at run time. The 
pattern description is part of the input, so the recognition process must be flexible. To solve this 
problem, ideas from automata theory are often used. 


If the pattern is specified by a regular expression, the pattern-recognition program can take this 
description and convert it into an equivalent nfa using the construction in Theorem 3.1.Theorem 2.2 
may then be used to reduce this to a dfa. This dfa, in the form of a transition table, is effectively the 
pattern-matching algorithm. All the programmer has to do 1s to provide a driver that gives the general 
framework for using the we can automatically handle a large number of patterns that are defined at run 
time. 


The efficiency of the program must also be considered. The construction of finite automata from 
regular expressions using Theorems 2.1 and 3.1 tends to yield automata with many states. If memory 
space is a problem, the state reduction method described in Section 2.4 is helpful. 


EXERCISES 


1.Use the construction in Theorem 3.1 to find an nfa that accepts the language L (ab*aa + bba*ab). 
2.Find an nfa that accepts the complement of the language in Exercise 1. 
3. Give an nfa that accepts the language L((a + b)* b(a + bb)*). 
4. Find dfa's that accept the following languages. 
(a) L (aa* + aba*b*),. 
(b) L (ab (a + ab)* (a + aa)). 
(c) L ((abab)* + (aaa* + b)*). 
(d) L (((aa*)* b)*). 
5. Find dfa's that accept the following languages. 
(a) L = L (ab*a*)U L ((ab)* ba). 


(b) L = L (ab*a*) 7 L ((ab)* ba). 


6. Find an nfa for Exercise 17(f), Section 3.1. Use this to derive a regular expression for that 
language. 


7. Find the minimal dfa that accepts L(a*bb) U L(ab*ba). 


8. Consider the following generalized transition graph. 


a 
ató 
ab 
a ie 


(a) Find an equivalent generalized transition graph with only two states. 
(b) What is the language accepted by this graph? 
9. What language is accepted by the following generalized transition graph? 


b 


(c) 


11. Rework Example 3.11, this time eliminating the state OO first. 


12. Show how all the labels in Figure 3.14 were obtained. 
13. Find a regular expression for the following languages on {a, b}. 
(a) L= {w : n, (w) and n, (w) are both even}. 
(b) L = {w:(n, (w) - np (w)) mod 3 = 1}. 
(c) L= {w :(n, (w) - n (w)) mod 3 = 0}. 
(d) L= {w :2n, (w)+3np (w)is even}. 
14. Prove that the construction suggested by Figures 3.11 and 3.12 generate equivalent generalized 
transition graphs. 
15. Write a regular expression for the set of all C real numbers. 


16. In some applications, such as programs that check spelling, we may not need an exact match of 
the pattern, only an approximate one. Once the notion of an approximate match has been made 
precise, automata theory can be applied to construct approximate pattern matchers. As an 
illustration of this, consider patterns derived from the original ones by insertion of one symbol. 


Let L be a regular language on È and define 
insert (L) = {uav: a € Ł,uv E€ L}. 


In effect, insert (L) contains all the words created fromZ by inserting a spurious symbol 
anywhere in a word. 


* (a) Given an nfa for L, show how one can construct an nfa for insert (L). 


** (b) Discuss how you might use this to write a pattern-recognition program for insert (L), using as 
input a regular expression for L. 


* 17. Analogous to the previous exercise, consider all words that can be formed from L by dropping 
a single symbol of the string. Formally define this operation drop for languages. Construct an nfa 
for drop (L), given an nfa for L. 


18. Use the construction in Theorem 3.1 to find nfa's for L (aØ)and L (Ø*¥). Is the result consistent 
with the definition of these languages? 


3.3 Regular Grammars 


A third way of describing regular languages is by means of certain grammars. Grammars are often an 
alternative way of specifying languages. Whenever we define a language family through an automaton 
or in some other way, we are interested in knowing what kind of grammar we can associate with the 
family. First, we look at grammars that generate regular languages. 


Right- and Left-Linear Grammars 


Definition 3.3 


A grammar G =(V, T, S, P) is said to be right-linear if all productions are of the form 
A— xB, 
Ax, 

where A, B € V, and x € 7*. A grammar is said to be left-linear if all productions are of the form 
A— Bx, 

or 
A—*x. 

A regular grammar ts one that is either right-linear or left-linear. 

Note that in a regular grammar, at most one variable appears on the right side of any production. 


Furthermore, that variable must consistently be either the rightmost or leftmost symbol of the right 
side of any production. 


Example 3.13 


The grammar G} = ({S}, {a,b},S,P,), with P} given as 
S — abSla 


is right-linear. The grammar G, = ({S, S,, S2}, {a, b}, S, P2), with productions 


S — Sab, 
Sı — S)ab)Sp, 
S5 — a, 


is left-linear. Both G/ and G2 are regular grammars. 
The sequence 


S = abS = ababS =ababa 


is a derivation with G}. From this single instance it is easy to conjecture that L (G,) is the language 
denoted by the regular expression r = (ab)* a. In a similar way, we can see that L(G») is the regular 
language L(aab(ab)*). 


Example 3.14 


The grammar G =({S, A, B}, {a, b}, S, P) with productions 


S=A 
A — aBh, 
B — Ab, 


is not regular. Although every production is either in right-linear or left-linear form, the grammar 
itself is neither right-linear nor left-linear, and therefore is not regular. The grammar is an example of 
a linear grammar. 


A linear grammar is a grammar in which at most one variable can occur on the right side of any 
production, without restriction on the position of this variable. Clearly, a regular grammar is always 
linear, but not all linear grammars are regular 


Our next goal will be to show that regular grammars are associated with regular languages and 
that for every regular language there is a regular grammar. Thus, regular grammars are another way of 
talking about regular languages. 


Right-Linear Grammars Generate Regular Languages 


First, we show that a language generated by a right-linear grammar is always regular. To do so, we 
construct an nfa that mimics the derivations ofa right linear grammar. Note that the sentential forms of 
a right-linear grammar have the special form in which there is exactly one variable and it occurs as 
the rightmost symbol. Suppose now that we have a step in a derivation 


ab...cD>ab...cdE, 


arrived at by using a production D — dE. The corresponding nfa can imitate this step by going from 
state D to state Æ when a symbol d is encountered. In this scheme, the state of the automaton 
corresponds to the variable in the sentential form, while the part of the string already processed is 
identical to the terminal prefix of the sentential form. This simple idea is the basis for the following 
theorem. 


Theorem 3.3 


Let G=(V, T, S, P) be a right-linear grammar. Then L (G) is a regular language. 
Proof: We assume that V = { Vj,V;,...}, that S = V}, and that we have productions of the form V, — 
Vi V, Vi — V2 l ..or V, > v,.... If w is a string in L (G), then because of the form of the productions 


Vo > wv, 


vi va \ ; 


J| 


=> V1V2 Vk Vn 


=> U1U9 °°: URU = W. (5.4) 


The automaton to be constructed will reproduce the derivation by consuming each of these v’s in turn. 
The initial state of the automaton will be labeled Vo, and for each variable V; there will be a nonfinal 


state labeled V;. For each production 
Vi > a, ay a,,V;, 


m j? 


Figure3.16 
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Represents F; jaa, 
Represents ¥;— BA 


the automaton will have transitions to connect Vi and Vj that 1s,5 will be defined so that 


5° (Visayan am) = V; 
For each production 


Vi = ajar” Any 


the corresponding transition of the automaton will be 
ò* (Vaaz... -am = Vp, 


where Ve is a final state. The intermediate states that are needed to do this are of no concern and can 


be given arbitrary labels. The general scheme is shown in Figure 3.16. The complete automaton is 
assembled from such individual parts. 


Suppose now that w € L(G) so that (3.4) is satisfied. In the nfa there is, by construction, a path 
from Vj to V; labeled v,, a path from V; to V; labeled v}, and so on, so that clearly 


Vf <o* (Vow), 


and w is accepted by M. 


Conversely, assume that w is accepted by M. Because of the way in which M was constructed, to 
accept w the automaton has to pass through a sequence of states Vp,V;,...to Vy, using paths labeled 


V/5,V7,---- Therefore, w must have the form 
W= V1V2 °°" VV] 


and the derivation 


P P P ,. P 
Vo > nV: > vi1wVy => vivo---upV_e > viv -- -ukuv 


is possible. Hence w is in L (G), and the theorem is proved. m 


Sass 
Example 3.15 
Construct a finite automaton that accepts the language generated by the grammar 
Vo = av, 
Vi — ab Volb, 


where Vọ is the start variable. We start the transition graph with vertices Vj, V;, and Vp The first 
production rule creates an edge labeled a between Vy) and V}. For the second rule, we need to 
introduce an additional vertex so that there is a path labeled ab between V/ and Vp. Finally, we need 
to add an edge labeled b between V; and V, giving the automaton shown in Figure 3.17. The language 
generated by the grammar and accepted by the automaton is the regular language L ((aab) * ab. 


Figure 3.17 


Right-Linear Grammars for Regular Languages 


To show that every regular language can be generated by some right-linear grammar, we start from 
the dfa for the language and reverse the construction shown in Theorem 3.3. The states of the dfa now 
become the variables of the grammar, and the symbols causing the transitions become the terminals in 
the productions. 


Theorem 3.4 


If Z is a regular language on the alphabet €, then there exists a right-linear grammar G = (V, E,S, P) 
such that L = L < (G). 


Proof: Let M = (Q, E,ô,qọ, F) be a dfa that accepts L. We assume that O = {q,q),...,gn} and È = 
{a1,á2,.. -A n}. Construct the right-linear grammar G =(V, E,S,P) with 


V= {dogi -an3 
and S = qọ. For each transition 
9( q5a;)=ak 
of M, we put in P the production 
di > jfk (3.5) 
In addition, if q, is in F, we add to P the production 
qk > A. (3.6) 


We first show that G defined in this way can generate every string in L. Consider w € L of the 
form 


w= aaj... a, a). 


For M to accept this string it must make moves via 


By construction, the grammar will have one production for each of these 6’s. Therefore, we can make 
the derivation 


+ 
do = Qidp => Q;4j5q, = Aiaj ++ - And 


{| 


Q;Q; °° *A,QiQs > ailj +++ aka (3.7) 


with the grammar G, and w € L(G). 
Conversely, if w € L(G), then its derivation must have the form (3.7). But this implies that 


ô* (qo, ajaj.. -apai)= qp 


completing the proof. m 


For the purpose of constructing a grammar, it is useful to note that the restriction that M be a dfa is 
not essential to the proof of Theorem 3.4. With minor modification, the same construction can be used 
if M is an nfa. 


Example 3.16 


Construct a right-linear grammar for L (aab*a). The transition function for an nfa, together with the 
corresponding grammar productions, is given in Figure 3.18. The result was obtained by simply 
following the construction in Theorem 3.4. The string aaba can be derived with the constructed 
grammar by 


qo > aq; > aaq, > aabq, > aabaqs > aaba. 


Figure 3.18 


Equivalence of Regular Languages and Regular Grammars 
The previous two theorems establish the connection between regular languages and right-linear 


grammars. One can make a similar connection between regular languages and left-linear grammars, 
thereby showing the complete equivalence of regular grammars and regular languages. 


Theorem 3.5 


A language L is regular if and only if there exists a left-linear grammar G such that L = L ( G). 
Proof: We only outline the main idea. Given any left-linear grammar with productions of the form 


A — By, 


A =v, 
we construct from it a right-linear grammar G by replacing every such production of G with 


A— vB, 


A— vk, 


respectively. A few examples will make it clear quickly that L(G)= (L(G ‘))R | Next, we use Exercise 
12, Section 2.3, which tells us that the reverse of any regular language is also regular. Since G is 
right-linear, Le is regular. But then so are L(G '))® and L(G). m 


Putting Theorems 3.4 and 3.5 together, we arrive at the equivalence of regular languages and regular 
grammars. 


Theorem 3.6 


A language L is regular if and only if there exists a regular grammar G such that L = L(G). 


Figure 3.19 


| Regular expressions | 
Theorem 3.1 | | Theorem 3.2 

| dfa or nfa | 
Theorem 3.4 


Theorem 3.3 | 


Regular grammars 


We now have several ways of describing regular languages: dfa's, nfa's, regular expressions, and 
regular grammars. While in some instance one or the other of these may be most suitable, they are all 
equally powerful. Each gives a complete and unambiguous definition of a regular language. The 
connection between all these concepts is established by the four theorems in this chapter, as shown in 
Figure 3.19. 


EXERCISES 


1. Construct a dfa that accepts the language generated by the grammar 
S — abA, 
A— baB, 
B — aAbb. 


2. Find a regular grammar that generates the language L (aa* (ab+ a)*). 


3. Construct a left-linear grammar for the language in Exercise 1. 


4. Construct right- and left-linear grammars for the language 


8. 
9. 


L= {a”b”:n>2, m23}. 


. Adapt the construction in Theorem 3.4 to find a left-linear grammar for the language accepted by 


the nfa below. 


. Construct a right-linear grammar for the language L ((aab*ab)*). 


. Find a regular grammar that generates the language on È = {a, b} consisting of all strings with no 


more than three a's. 
In Theorem 3.5, prove that L(G) = (L(G))È. 


Suggest a construction by which a left-linear grammar can be obtained from an nfa directly. 


10. Find a left-linear grammar for the language in Exercise 6. 


11. Find a regular grammar for the language L = {a"b” : n + mis even}. 


12. Find a regular grammar that generates the language 


L = {w € {a,b} : na (w) + 3n (w) is even}. 


13. Find regular grammars for the following languages on { a, b}. 


(a) L = {w : na (w) and ns (w) are both even}. 
(b) L = {w : (ng (w) — me (w)) mod 3 = 1}. 
(c) L = {w : (na (w) — ne (w)) mod 3 Æ 1}. 
(d) L = {w : (na (w) — ne (w)) mod 3 Æ 0}. 


{ 
(e) L={w: |na(w) — ne (w)| is odd}. 


14. Show that for every regular language not containing A there exists a right-linear grammar whose 


productions are restricted to the forms 
A— aB, 


or 


where A, Be V, anda E T. 


15. Show that any regular grammar G for which L (G) # Ø must have at least one production of the 


form 
AÁA—>x 


where 4e VandxeT*. 


16. Find a regular grammar that generates the set ofall real numbers in C. 
17. Let G, = (V>,2,S,P,) be right-linear and G= (V>, &,,S2,P2) be a left-linear grammar, and assume 
that V} and V, are disjoint. Consider the linear grammar G =({S}U V, U V5, 2S, P), where S is 


not in V4 U V, and P= {S — S)|S5}U P, U P>. Show that L(G) is regular. 


Chapter 4 


Properties of 
Regular Languages 


e have defined regular languages, studied some ways in which they can be represented, 
and have seen a few examples of their usefulness. We now raise the question of how 
W general regular languages are. Could it be that every formal language is regular? Perhaps 
any set can be accepted by some, albeit very complex, finite automaton. As we will see 
shortly, the answer to this conjecture is definitely no. But to understand why this is so, 
we must inquire more deeply into the nature of regular languages and see what properties the whole 
family has. 


The first question we raise is what happens when we perform operations on regular languages. 
The operations we consider are simple set operations, such as concatenation, as well as operations in 
which each string of a language is changed, as for instance in Exercise 24, Section 2.1. Is the resulting 
language still regular? We refer to this as a closure question. Closure properties, although mostly of 
theoretical interest, help us in discriminating between the various language families we will 
encounter. 


A second set of questions about language families deals with our ability to decide on certain 
properties. For example, can we tell whether a language is finite or not? As we will see, such 
questions are readily answered for regular languages, but are not as easy for other language families. 


Finally we consider the important question: How can we tell whether a given language is regular 
or not? If the language is in fact regular, we can always show it by giving some dfa, regular 
expression, or regular grammar for it. But if it is not, we need another line of attack. One way to show 
a language is not regular is to study the general properties of regular languages, that is, characteristics 
that are shared by all regular languages. If we know of some such property, and if we can show that 
the candidate language does not have it, then we can tell that the language is not regular. 


In this chapter, we look at a variety of properties of regular languages. These properties tell us a 
great deal about what regular languages can and cannot do. Later, when we look at the same questions 
for other language families, similarities and differences in these properties will allow us to contrast 
the various language families. 


4.1 Closure Properties of Regular Languages 
Consider the following question: Given two regular languages L, and L,, is their union also regular? 


In specific instances, the answer may be obvious, but here we want to address the problem in general. 
Is it true for all regular L} and L,? It turns out that the answer is yes, a fact we express by saying that 


the family of regular languages is closed under union. We can ask similar questions about other types 


of operation son languages; this leads us to the study of the closure properties of languages in general. 


Closure properties of various language families under different operations are of considerable 
theoretical interest. At first sight, it may not be clear what practical significance these properties 
have. Admittedly, some of them have very little, but many results are useful. By giving us insight into 
the general nature of language families, closure properties help us answer other, more practical 
questions. We will see instances of this (Theorem 4.7 and Example 4.13) later in this chapter. 


Closure under Simple Set Operations 


We begin by looking at the closure of regular languages under the common set operations, such as 
union and intersection. 


Theorem 4.1 


a ; 
If ZL, and L, are regular languages, then so are L4 UL, Li N Lp, LLL 1, and Li. We say that the 
family of regular languages is closed under union, intersection, concatenation, complementation, and 
star-closure. 


Proof: If L| and L, are regular, then there exist regular expressions 7, and r2 such that L4 = L(r,) and 
Yr ¥ 
L, = L(r2). By definition, r4 + 75, rrz, and l1 are regular expressions denoting the languages L, U L,, 
$ 
LL, and L 1, respectively. Thus, closure under union, concatenation, and star-closure is immediate. 


To show closure under complementation, let M = (Q, 2,6, qo, F) be a dfa that accepts L,. Then the 
dfa 


M =(Q,>,6,q,Q—F) 


accepts Ly. This is rather straightforward; we have already suggested the result in Exercise 4 in 
Section 2.1. Note that in the definition of a dfa, we assumed 5" to be a total function, so that 5° (qow) 


is defined for all w € £“. Consequently either 5°(qo.W) is a final state, in which case w € L, or 8'(qo, 


w)€Q-Fandwe L. 
Demonstrating closure under intersection takes a little more work. Let L} =L (M,) and L, =L 
(M,), where M, = (Q,%,0),90,/|) and M, = (P,%, 55,P,/",) are dfa's. We construct from M, and M, a 


M = (Q. 4, ( qo; Po), P), 


combined automaton , whose state setẸ® =Q xP consists of pairs (q; pj), and 


whose transition function 6 is such that ?⁄ is in state (qi, p;) whenever M; is in state q; and M; is in 
state p;. This is achieved by taking 


Ô ((q;,P;).@) = (qk, Pr): 


whenever 
61 (q;,a@) = qk 
and 


do (p;,a) = pr. 


F is defined as the set of all (q; p;), such that q; € F; and p; € F}. Then it is a simple matter to show 


that w € L, N L, if and only if it is accepted by 7 Consequently, L4 N L, is regular. m 
a) 


The proof of closure under intersection is a good example of a constructive proof. Not only does 
it establish the desired result, but it also shows explicitly how to construct a finite accepter for the 
intersection of two regular languages. Constructive proofs occur throughout this book; they are 
important because they give us insight into the results and often serve as the starting point for 
practical algorithms. Here, as in many cases, there are shorter but nonconstructive (or at least not so 
obviously constructive) arguments. For closure under intersection, we start with DeMorgan's law, 
Equation (1.3), taking the complement of both sides. Then 


Lı N La = L1 U L2 
for any languages L, and L). Now, if L} and L, are regular, then by closure under complementation, so 


are Lı and Zz. Using closure under union, wenext get that Lı UL? is regular. Using closure under 
complementation once more, we see that 


Ty UL» = Li N L2 


is regular. 
The following example is a variation on the same idea. 


Example 4.1 


Show that the family of regular languages is closed under difference. In other words, we want to 
show that if L, and L, are regular, then L, — L, is necessarily regular also. 


The needed set identity is immediately obvious from the definition of a set difference, namely 
Li — Lə = Li N Lz. 
The fact that L, is regular implies that Z2 is also regular. Then, because of the closure of regular 


languages under intersection, we know that L1 N L2 is regular, and the argument is complete. 


A variety of other closure properties can be derived directly by elementary arguments. 


Theorem 4.2 


The family of regular languages is closed under reversal. 


Proof: The proof of this theorem was suggested as an exercise in Section 2.3. Here are the details. 
Suppose that Z is a regular language. We then construct an nfa with a single final state for it. By 
Exercise 7, Section 2.3, this is always possible. In the transition graph for this nfa we make the initial 
vertex a final vertex, the final vertex the initial vertex, and reverse the direction on all the edges. It is 
a fairly straightforward matter to show that the modified nfa accepts w ê if and only if the original nfa 
accepts w. Therefore, the modified nfa accepts LË, proving closure under reversal. m 


Closure under Other Operations 


In addition to the standard operations on languages, one can define other operations and investigate 
closure properties for them. There are many such results; we select only two typical ones. Others are 
explored in the exercises at the end of this section. 


Definition 4.1 


Suppose È and I are alphabets. Then a function 
h: XL — T" 


is called a homomorphism. In words, a homomorphism is a substitution in which a single letter is 
replaced with a string. The domain of the function / is extended to strings in an obvious fashion; if 


w= Ma2 eln, 
then 
h(w)= h(a,)h(ag)---h(a,). 
If L is a language on È, then its homomorphic image is defined as 


h(L) = {h(w):w €L} 


Example 4.2 


Let È = {a, b, c} andT = {a, b, c,} define h by 


h (a) = ab. 


h (b) = bbe. 
Then A (aba) = abbbcab. The homomorphic image of L = {aa, aba} is the language h (L) = 
{abab, abbbcab}. 


If we have a regular expressionr for a language L, then a regular expression for h (L) can be 
obtained by simply applying the homomorphism to each £ symbol ofr. 


Example 4.3 


Take È = {a, b} and T = {b, c, d}. Define h by 


h(a) = dbec, 


h(b) = bde. 
If L is the regular language denoted by 
r=(a+b*)(aa)". 
then 
rı = (dbec + (bdc)*) (dbecdbee)"* 


denotes the regular language h (L). 


The general result on the closure of regular languages under any homomorphism follows from this 
example in an obvious manner. 


Theorem 4.3 


Leth be a homomorphism. IfZ is a regular language, then its homomorphic image A (L) is also 
regular. The family of regular languages is therefore closed under arbitrary homomorphisms. 


Proof: LetZ be a regular language denoted by some regular expressionr. We find A (r) by 
substituting h (a) for each symbol a € È ofr. It can be shown directly by an appeal to the definition of 
a regular expression that the result is a regular expression. It 1s equally easy to see that the resulting 
expression denotes h (L). All we need to do is to show that for every w e L (r), the corresponding A 
(w) is in L (h (r)) and conversely that every v inZ (h (r)) there is aw in L, such that v =A (w). 
Leaving the details as an exercise, we claim that A (L) is regular. m 


Definition 4.2 


Let L, and L, be languages on the same alphabet. Then the right quotient of L} with L, is defined 


Lı/Lo = {x : xy € Lı for some y € Lo}. (4.1) 


To form the right quotient of L; with L, we take all the strings in Z, that have a suffix belonging to 
L,. Every such string, after removal of this suffix, belongs to L,/Z). 


Example 4.4 
If 
Lı = {a"b™ :n >1,m>0}U {ba} 
and 
Ly = {b":m>1} 
then 
Lı /L2 = {a"b™ :n>1,m>0}. 


The strings in L, consist of one or more b’s. Therefore, we arrive at the answer by removing one 
or more b’s from those strings in L, that terminate with at least one b. 


Note that here L4, L), and L,/L, are all regular. This suggests that the right quotient of any two 


regular languages is also regular. We will prove this in the next theorem by a construction that takes 
the dfa's for L, and L, and constructs from them a dfa for L,/L,. Before we describe the construction 


in full, let us see how it applies to this example. We start with a dfa for L4; say the automaton M} = 
(Q,2,0,go, F) in Figure 4.1. Since an automaton for L,/L, must accept any prefix of strings in L,, we 
will try to modify M4 so that it accepts x if there is any y satisfying (4.1). The difficulty comes in 
finding whether there is some y such that xy € L, and y € L,. To solve it, we determine, for each q € 
Q, whether there is a walk to a final state labeled v such that v € L,. If this is so, any x such that ô(qọ, 
x)= q will be inL,/L, We modify the automaton accordingly to make q a final state. 

To apply this to our present case, we check each state 4,91, 92, 93, q4, q5 to see whether there is a 
walk labeled bb* to any of the q,, q2, or q4. We see that only q; and q qualify; qo» q3, q4 do not. The 
resulting automaton for L,/L, is shown in Figure 4.2. Check it to see that the construction works. The 
idea is generalized in the next theorem. 


Figure 4.1 


Figure 4.2 


Theorem 4.4 


IfZ, and L, are regular languages, then L, /L, is also regular. We say that the family of regular 
languages is closed under right quotient with a regular language. 


Proof: LetZL, =L (M), where M = (Q,2, ò,qọ F) is a dfa. We construct another dfa 
M= (Q. >, Ô,q0, F) E 
as follows. For each q; € Q, determine if there exists a y € L, such that 


+ 


õ* (qy) = qp EF. 


This can be done by looking at dfa's M; = (O,%,6, g; F). The automaton M; is M with the initial state 


qo replaced by q;. We now determine whether there exists a y in L (M;) that is also in L5. For this, we 
can use the construction for the intersection of two regular languages given in Theorem 4.1, finding 
the transition graph for L) N L (M;). If there is any path between its initial vertex and any final vertex, 


then L, N L (M) is not empty. In that case, add q; to F . Repeating this for every q; € Q, we determine 


F and thereby construct “” 


To prove that L ( My =L,/L,, let x be any element of L,/L,. Then there must be a y € L, such that 
xy € Lı . This implies that 
5° (qo, ry) € F, 
so that there must be some q € Q such that 
ô* (q0, £) =q 
and 
5* (q,y) EF. 


Therefore, by construction, q € F , and "accepts x because 5* (qo, x) is E`. 


Conversely, for any x accepted by a we have 


ô* (q, Tt) = qe F. 


But again by construction, this implies that there exists a y € L, such that 6* (q, y) € F. Therefore, xy 
is in L4, and x is in L} /L,. We therefore conclude that 


L (17) = L/L, 
and from this that L, /L, is regular. m 
———— 
Example 4.5 
Find L/L, for 
Lı = L(a*baa’*). 
La =: L (ab* lis 


We first find a dfa that accepts L,. This is easy, and a solution is given in Figure 4.3. The example is 


simple enough so that we can skip the formalities of the construction. From the graph in Figure 4.3 it 
is quite evident that 


L (Mo) N La = Ø, 
L(Mı)N Le = {a 
L (Ma) N La = { 

L (M3) N Le = Ø. 


a} Æ 
Therefore, the automaton accepting L,/L, is determined. The result is shown in Figure 4.4. It accepts 


the language denoted by the regular expression of a*b + a*baa*, which can be simplified to a*ba*. 
Thus L/L, = L(a*ba*). 


Figure 4.3 


Figure 4.4 


EXERCISES 


1. Fill in the details of the constructive proof of closure under intersection in Theorem 4.1. 
2. Use the construction in Theorem 4.1 to find nfa's that accept 

(a) L ((a +b) a®) N L (baa*). 

(b) L (ab*a*) N L (a*b*a). 


3. In Example 4.1 we showed closure under difference for regular languages, but the proof was 


nonconstructive. Provide a constructive argument for this result, following the approach used in 
the argument for intersection in Theorem 4.1. 


4. In the proof of Theorem 4.3, show that A (r) is a regular expression. Then show that A (r) denotes 
h(L). 


5. Show that the family of regular languages is closed under finite union and intersection, that is, if 
L,L,..., Ly are regular, then 


Ly = U Li 


and 


are also regular. 


6. The symmetric difference of two sets S, and S, is defined as 
S1 0 S = {x: x € $4 or x € S5, but x is not in both S; and $7}. 
Show that the family of regular languages is closed under symmetric difference. 


7. The nor of two languages is 
nor (Li, Lo) = {w : w ¢ Ly and w ¢ Lo}. 
Show that the family of regular languages is closed under the nor operation. 


8. Define the complementary or (cor) of two languages by 
cor (Ly,Lo)={w:weT, or w eTa}. 
Show that the family of regular languages is closed under the cor operation. 

9. Which of the following are true for all regular languages and all homomorphisms? 

(a) h (Li U L= A (Li) N A (Ly). 

(b) A (Li N Ly)= h (Ly) N A (Ly). 

(c) A (LiL) =A (Li) h (Ly). 
10. Let L, = L (a*baa*) and L, = L (aba*). Find L,/L,. 
11. Show that L4 = L,L,/L, is not true for all languages L, and L3. 


*12. Suppose we know that L, U L, is regular and that L, is finite. Can we conclude from this that L, 
is regular? 


13. If L is a regular language, prove that L; = {uv : u € L, o| = 2} is also regular. 


14. If L is a regular language, prove that the language {uv : u € L,» € LÊ} is also regular. 
15. The left quotient of a language L, with respect to L, is defined as 


La/L1ı = {y : x E La, zy E Li}. 


Show that the family of regular languages is closed under the left quotient with a regular 
language. 


16. Show that if the statement “If L} is regular and L, U L, is also regular, then L, must be regular“ 
were true for all Z, and L,, then all languages would be regular. 


17. The tail of a language is defined as the set of all suffixes of its strings, that is, 
tail (L) = {y : ry E L for some z € X°}. 


Show that if Z is regular, so is tail(L). 
18. The head of a language is the set of all prefixes of its strings, that is, 


head (L) ={x: ay € L for some y € &"}. 
Show that the family of regular languages is closed under this operation. 


19. Define an operation third on strings and languages as 


third (a,a2a3a4a506°-:) = agas- 
with the appropriate extension of this definition to languages. Prove the closure of the family of 
regular languages under this operation. 
20. For a string a,a5...a, define the operation shift as 
shift (aya2+++dpn) = 42°++Gn a4. 


From this, we can define the operation on a language as 


shift(L) = {v : v = shift (w) for some w € L}. 


Show that regularity is preserved under the shift operation. 
21. Define 


exchange (@4@9-*-Q,_1@,) = Anl? anil: 
and 
exchange (L) = 4v : v = erchange (w) for some w E L}. 


Show that the family of regular languages is closed under exchange. 


*22. The shuffle of two languages L, and L, is defined as 


Show that the family of regular languages is closed under the shuffle operation. 


* 23. Define an operation minus5 on a language L as the set of all strings of L with the fifth symbol 
from the left removed (strings of length less than five are left unchanged). Show that the family of 
regular languages is closed under the minus5 operation. 


* 24. Define the operation left side on L by 
leftside (L) = {w : ww” EL }. 
Is the family of regular languages closed under this operation? 


25. The min of a language L is defined as 


min (L) = {w E L: there is no u € L,v € E+, such that w = uv}. 


Show that the family of regular languages is closed under the min operation. 


26. Let G) and G, be two regular grammars. Show how one can derive regular grammars for the 
languages 
(a) L (G1) UL (G3). 


(b) Z (G1) L(G). 
(b) Z (G1)*. 


4.2 Elementary Questions about Regular Languages 


We now come to a very fundamental issue: Given a language L and a string w, can we determine 
whether or not w is an element of L? This is the membership question and a method for answering it 
is called a membership algorithm. * Very little can be done with languages for which we cannot find 
efficient membership algorithms. The question of the existence and nature of membership algorithms 
will be of great concern in later discussions; it is an issue that is often difficult. For regular 
languages, though, it is an easy matter. 

We first consider what exactly we mean when we say “given a language....” In many arguments, it 
is important that this be unambiguous. We have used several ways of describing regular languages: 
informal verbal descriptions, set notation, finite automata, regular expressions, and regular grammars. 
Only the last three are sufficiently well defined for use in theorems. We therefore say that a regular 
language is given in a standard representation if and only if it is described by a finite automaton, a 
regular expression, or a regular grammar. 


Theorem 4.5 


Given a standard representation of any regular language L on X and any w e &”*, there exists an 
algorithm for determining whether or not w is in L. 


Proof: We represent the language by some dfa, then test w to see if it is accepted by this automaton. m 


Other important questions are whether a language is finite or infinite, whether two languages are 
the same, and whether one language is a subset of another. For regular languages at least, these 
questions are easily answered. 


Theorem 4.6 


There exists an algorithm for determining whether a regular language, given in standard 
representation, is empty, finite, or infinite. 

Proof: The answer is apparent if we represent the language as a transition graph of a dfa. If there is a 
simple path from the initial vertex to any final vertex, then the language is not empty. 


To determine whether or not a language is infinite, find all the vertices that are the base of some 
cycle. If any of these are on a path from an initial to a final vertex, the language is infinite. Otherwise, 
itis finite. m 


The question of the equality of two languages is also an important practical issue. Often several 
definitions of a programming language exist, and we need to know whether, in spite of their different 
appearances, they specify the same language. This is generally a difficult problem; even for regular 
languages the argument is not obvious. It is not possible to argue on a sentence-by-sentence 
comparison, since this works only for finite languages. Nor is it easy to see the answer by looking at 
the regular expressions, grammars, or dfa's. An elegant solution uses the already established closure 
properties. 


Theorem 4.7 


Given standard representations of two regular languages L} and L,, there exists an algorithm to 
determine whether or not L, = Lp. 


Proof: Using L; and L, we define the language 
Ls = (L1 A La) U (Lr N La). 


By closure, L} is regular, and we can find a dfa M that accepts L3. Once we have M we can then use 
the algorithm in Theorem 4.6 to determine if L, is empty. But from Exercise 8, Section 1.1, we see 
that L; = Ø ifand only if L} = L2. m 


These results are fundamental, in spite of being obvious and unsurprising. For regular languages, 
the questions raised by Theorems 4.5 to 4.7 can be answered easily, but this is not always the case 
when we deal with other language families. We will encounter questions like these on several 
occasions later on. Anticipating a little, we will see that the answers become increasingly more 
difficult, and eventually impossible to find. 


EXERCISES 


For all the exercises in this section, assume that regular languages are given in standard 
representation. 


1. Show that there exists an algorithm to determine whether or not w € L, — L}, for any given w and 
any regular languages L, and L}. 


. Show that there exists an algorithm for determining if L4 S L,, for any regular languages L, and L3. 


2 
3. Show that there exists an algorithm for determining if A € L, for any regular language L. 

4. Show that for any regular LZ, and L, there is an algorithm to determine whether or not L; = L/L». 
5 


. A language is said to be a palindrome language if L = LË. Find an algorithm for determining if a 
given regular language is a palindrome language. 


6. Exhibit an algorithm for determining whether or not a regular language L contains any string w 
such that w* € L. 


7. Exhibit an algorithm that, given any three regular languages, L, L4, L,, determines whether or not L 
= Lı L 2 
8. Exhibit an algorithm that, given any regular language L, determines whether or not L = L*. 


9. Let L be a regular language on È and ĉ be any string ind’. Find an algorithm to determine if L 
contains any w such that @ is a substring of it, that is, such that w = wiv with u,v € d*. 


10. Show that there is an algorithm to determine if L = shuffle (L, L) for any regular L. 
11. The operation tail (L) is defined as 
tail (L) = {v : w E L, unv E X*}. 


Show that there is an algorithm for determining whether or not L = tail (L) for any regular L. 


12. Let L be any regular language on È = {a, b}. Show that an algorithm exists for determining if L 
contains any strings of even length. 


13. Show that there exists an algorithm that can determine for every regular language L, whether or 
not |L| > 5. 


14. Find an algorithm for determining whether a regular language L contains an infinite number of 
even-length strings. 


15. Describe an algorithm which, when given a regular grammar G, can tell us whether or not L (G) 


= pse 


* Later we will make precise what the term “algorithm” means. For the moment, think of it as a method for which one can write a 
computer program. 


ae Identifying Nonregular Languages 


Regular languages can be infinite, as most of our examples have demonstrated. The fact that regular 
languages are associated with automata that have finite memory, however, imposes some limits on the 
structure of a regular language. Some narrow restrictions must be obeyed if regularity is to hold. 
Intuition tells us that a language is regular only if, in processing any string, the information that has to 
be remembered at any stage is strictly limited. This is true, but has to be shown precisely to be used 
in any meaningful way. There are several ways in which this can be done. 


Using the Pigeonhole Principle 


The term “pigeonhole principle” is used by mathematicians to refer to the following simple 
observation. If we put n objects into m boxes (pigeonholes), and ifn > m, then at least one box must 
have more than one item in it. This is such an obvious fact that it is surprising how many deep results 
can be obtained from it. 


Example 4.6 


Is the language L ={a"b"” :n > 0} regular? The answer is no, as we show using a proof by 
contradiction. 


Suppose L is regular. Then some dfa M = (Q, {a, b},5, qo, F) exists for it. Now look at 5* (qo a’) 


for i = 1, 2, 3,.... Since there are an unlimited number ofi’s, but only a finite number of states in M, 
the pigeonhole principle tells us that there must be some state, say g, such that 


at (go. a` j= q 
and 
6* (qo,a™) = q, 
with n # m. But since M accepts a”b” we must have 
d* (g,b") = qj € F. 


From this we can conclude that 


é* ( qo; a™b") = d* (d* | go,a™), b") 
= d*(q,b") 


= qf. 


This contradicts the original assumption that M accepts ab” only ifn =m, and leadsusto conclude 
that L cannot be regular. 


In this argument, the pigeonhole principle is just a way of stating unambiguously what we mean 
when we say that a finite automaton has a limited memory. To accept all a”b”, an automaton would 


have to differentiate between all prefixes a” and a”. But since there are only a finite number of 
internal states with which to do this, there are some n and m for which the distinction cannot be made. 


In order to use this type of argument in a variety of situations, it is convenient to codify it as a 
general theorem. There are several ways to do this; the one we give here is perhaps the most famous 
one. 


A Pumping Lemma 


The following result, known as the pumping lemma for regular languages, uses the pigeonhole 
principle in another form. The proof is based on the observation that in a transition graph withn 
vertices, any walk of length n or longer must repeat some vertex, that is, contain a cycle. 


Theorem 4.8 


Let L be an infinite regular language. Then there exists some positive integer m such that any w € L |w| 
> m can be decomposed as 


with 
and 


such that 
Wi = rY z, (4.2) 


is also in L for all i = 0, 1, 2,.... 

To paraphrase this, every sufficiently long string in L can be broken into three partsin such a way 
that an arbitrary number of repetitions ofthe middle part yields another string in L. We say that the 
middle string is “ pumped,” hence the term pumping lemma for this result. 


Proof: IfZ is regular, there exists a dfa that recognizes it. Let such a dfa have states labeled gp, q), 
q2. qn. Now take a string w in L such that |w| > =n +1. Since L is assumed to be infinite, this an 
always be done. Consider the set of states the automaton goes through as it processes w, say 


W; Ji; Ji; s Qf 


Since this sequence has exactly |w| + 1 entries, at least one state must be repeated, and such a 
repetition must start no later than the nth move. Thus, the sequence must look like 


q0; Gis Gj: sses Ors siss Orgy erag qf: 


indicating there must be substrings x, y, z of w such that 


ô* (qo, £) = qr; 


&* (qr: Y) = Gr; 


ô* (q,,z) = dfs 


with y| < n+1 = m and |y| > 1. From this it immediately follows that 
é* (go. rz) = qf, 


as well as 


oO” ( do, ry? z) = tre 
ô” (qo, ry°z) = GF 


and so on, completing the proof of the theorem. m 
a 


We have given the pumping lemma only for infinite languages. Finite languages, although always 
regular, cannot be pumped since pumping automatically creates an infinite set. The theorem does hold 
for finite languages, but it is vacuous. The m in the pumping lemma is to be taken larger than the 
longest string, so that no string can be pumped. 


The pumping lemma, like the pigeonhole argument in Example 4.6, is used to show that certain 
languages are not regular. The demonstration is always by contradiction. There is nothing in the 
pumping lemma, as we have stated it here, that can be used for proving that a language is regular. 
Even if we could show (and this is normally quite difficult) that any pumped string must be in the 
original language, there is nothing in the statementof Theorem 4.8 that allows us to conclude from this 
that the language is regular. 


Example 4.7 


Use the pumping lemma to show that L = {a”b” : n > 0} is not regular. Assume that L is regular, so 
that the pumping lemma must hold. We do not know the value of m, but whatever it is, we can always 
choose n =m. Therefore, the substring y must consist entirely of a's. Suppose |y| = k. Then the string 


obtained by using 7 = 0 in Equation (4.2) is 


Wo = ankn 


and is clearly not in L. This contradicts the pumping lemma and thereby indicates that the assumption 
that L is regular must be false. 


In applying the pumping lemma, we must keep in mind what the theorem says. We are guaranteed 
the existence of an m as well as the decomposition xyz, but we do not know what they are. We cannot 
claim that we have reached a contradiction just because the pumping lemma is violated for some 
specific values of m or xyz. On the other hand, the pumping lemma holds for every w € L and every i. 
Therefore, if the pumping lemma is violated even for one w or i, then the language cannot be regular. 


The correct argument can be visualized as a game we play against an opponent. Our goal is to win 
the game by establishing a contradiction of the pumping lemma, while the opponent tries to foil us. 
There are four moves in the game. 


1. The opponent picks m. 


2. Given m, we pick a string w in L of length equal or greater than m. We are free to choose any w, 
subject to w € L and |w| > m. 


3. The opponent chooses the decomposition xyz, subject to y| < m, |y| > 1. We have to assume that 
the opponent makes the choice that will make it hardest for us to win the game. 


4. We try to pick 7 in such a way that the pumped string w,, defined in Equation (4.2), is not in L. If 
we can do so, we win the game. 


A strategy that allows us to win whatever the opponent's choices is tantamount to a proof that the 
language is not regular. In this, Step 2 is crucial. While we cannot force the opponent to pick a 
particular decomposition of w, we may be able to choose w so that the opponent is very restricted in 
Step 3, forcing a choice of x, y, and z that allows us to produce a violation of the pumping lemma on 
our next move. 


Example 4.8 


Show that 


L= {ww :we >* 


M™M 


is not regular. 

Whatever m the opponent pickson Step 1, we can always choose aw as shown in Figure 4.5. 
Because of this choice, and the requirement that y| < m, the opponent is restricted in Step 3 to 
choosing a y that consists entirely of a’s. In Step 4, we use i = 0. The string obtained in this fashion 


has fewer a’s on the left than on the right and so cannot be of the form ww*. Therefore, L is not 
regular. 


Figure 4.5 


Note that 1f we had chosen a w too short, then the opponent could have chosen a y with an even 
number of b’s. In that case, we could not have reached a violation of the pumping lemma on the last 
step. We would also fail if we were to choose a string consisting of all a’s, say, 


which is in L. To defeat us, the opponent need only pick 


Now w;is in L for all i, and we lose. 


To apply the pumping lemma we cannot assume that the opponent will make a wrong move. If, in 
the case where we pick w = a~”, the opponent were to pick 


y= a, 


then wọ is a string of odd length and therefore not in L. But any argument that assumes that the 
opponent is so accommodating is automatically incorrect. 


Example 4.9 


Let È = {a, b}. The language 


L= {we X* : na (w) < na (w)} 
is not regular. 


Suppose we are given m. Since we have complete freedom in choosing w, we pick w = a”b™™!, 


Now, because |xy| cannot be greater than m, the opponent cannot do anything but pick a y with all a’s, 
that is 


y= a l<k<m. 


We now pump up, using i = 2. The resulting string 


is notin L. Therefore, the pumping lemma is violated, and L is not regular. 


Example 4.10 


The language 


L= {(ab)"a* :n>k,k> 0} 


is not regular. 
Given m, we pick as our string 


w = (ab) ™ t! a4 
which is in L. Because of the constraint |xy| < m, both x and y must be in the part of the string made up 
ofab’s. The choice of x does not affect the argument, so let us see what can be done with y. If our 
opponent picks y = a, we choose i = 0 and get a string not in L ((ab)* a*}. If the opponent picks y = 


ab, we can choose i = 0 again. Now we get the string (ab)” a”, which is not in L. In the same way, 
we can deal with any possible choice by the opponent, thereby proving our claim. 


Example 4.11 


Show that 
L= {a" : nis a perfect square} 


is not regular. 
Given the opponent's choice of m, we pick 


If w = xyz is the decomposition, then clearly 


with 1 < k < m. In that case, 


m 
wn = a 


But m? — k> (m — 1), so that wọ cannot be in L. Therefore, the language is not regular. 


In some cases, closure properties can be used to relate a given problem to one we have already 
classified. This may be simpler than a direct application of the pumping lemma. 


Example 4.12 


Show that the language 


is not regular. 


It is not difficult to apply the pumping lemma directly, but it is even easier to use closure under 
homomorphism. Take 


h (a) = a, h (b) = a, h (c) = c; 
then 


h(L) = ia + en+tk> 0} 


= {a'c i O} i 


But we know this language is not regular; therefore, L cannot be regular either. 


Example 4.13 


Show that the language 
L= {a"b' :n #1} 


is not regular. 


Here we need a bit of ingenuity to apply the pumping lemma directly. Choosing a string with n =/ 
+ l orn =/+2 will not do, since our opponent can always choose a decomposition that will make it 
impossible to pump the string out of the language (that is, pump it so that it has an equal number of 
a’sand b’s). We must be more inventive. Let us take n =m! and/ = (m +1)!. If the opponent now 
chooses a y (by necessity consisting of all a’s) of length k < m, we pump i times to generate a string 
with m! +(i — 1) ka’s. We can get a contradiction of the pumping lemma if we can pick i such that 


m!+(i—1l)k=(m+1)! 
This is always possible since 


m m! 


i= 1 + — 
k 


and k < m. The right side is therefore an integer, and we have succeeded in violating the conditions of 
the pumping lemma. 


However, there is a much more elegant way of solving this problem. Suppose L were regular. 


Then, by Theorem 4.1, Land the language 
Li = LHL (a*b*) 


would also be regular. But L} = {a”b” :n > 0}, which we have already classified as nonregular. 
Consequently, L cannot be regular. 


The pumping lemma is difficult to understand and it is easy to go astray when applying it. Here 
are some common pitfalls. Watch out for them. 


One mistake is to try using the pumping lemma to show that a language is regular. Even if you can 


show that no string in a language L can ever be pumped out, you cannot conclude that L is regular. The 
pumping lemma can only be used to prove that a language is not regular. 


Another mistake is to start (usually inadvertently) with a string not in L. For example, suppose we 
try to show that 


L = {a" : nis a prime number} (4.3) 


99 


is not regular. An argument that starts with “Given m, letw =a"™...,” is incorrect since m is not 
M 


necessarily prime. To avoid this pitfall, we need to start with something like “Given m, let w =a”, 
where M is a prime number larger than m.” 


Finally, perhaps the most common mistake is to make some assumptions about the decomposition 
xyz. The only thing we can say about the decomposition is what the pumping lemma tells us, namely, 
that y is not empty and that |xy| < m; that is, that y must be within m symbols of the left end of the 
string. Anything else makes the argument invalid. A typical mistake in trying to prove that the 


language in Equation (4.3) is not regular is to say that y = a“, with k odd. Then of course w = xz is an 
even-length string and thus not in L. But the assumption on & is not permitted and the proof is wrong. 


But even if you master the technical difficulties of the pumping lemma, it may still be hard to see 
exactly how to use it. The pumping lemma is like a game with complicated rules. Knowledge of the 
rules is essential, but that alone is not enough to play a good game. You also need a good strategy to 
win. If you can apply the pumping lemma correctly to some of the more difficult cases in this book, 
you are to be congratulated. 


EXERCISES 


1. Prove the following version of the pumping lemma. If L is regular, then there is an m such that, 
every w € L of length greater than m can be decomposed as 


W=xyz, 


with |yz| < m and |y|> 1, such that xy’z is in L for all i. 


2. Prove the following generalization of the pumping lemma, which includes Theorem 4.8 as well as 
Exercise | as special cases. 


If L is regular, then there exists an m, such that the following holds for every sufficiently long w € 
L and every one of its decompositions w = uvuz, with u4,u € d*, o| < m. The middle string v can 


be written as v = xyz, with y| < m, |y|> 1, such that wu xy'zu, € L for all i = 0,1, 2,.... 
3. Show that the language L = {w : n, (w) =n,(w) } is not regular. Is L* regular? 
4. Prove that the following languages are not regular. 

(a) L= fa"blak : k> n+ D}. 

(b) L= {a"b'a® :kintl. 


(c) L= {a"b'a*: n=lorlék}. 
(d) L = {a"b!: n< h. 
(e) L = iw: na (w) # n, (w) 5. 
(A L= {ww: we{a, b}*}. 
(9 L= {www : we {a, b}*}. 
5. Determine whether or not the following languages on È = {a} are regular. 
(a) L= {a": n= 2, is a prime number}. 
(b) L = {a": n is not a prime number}. 
(c) L= {a":n=P for some k > 0}. 
(d) L = {a": n=2* for some k > 0}. 
(e) L= {a": nis the product of two prime numbers}. 
(Ð L= {a": n is either prime or the product of two or more prime numbers}. 
(g) L” ., where L is the language in part (a). 
6. Determine whether or not the following languages are regular. 
(a). ofa Ra TU arin Sam = TE 
Ce ae {a Pe OV 1} U {a BES say 5 1}. 


7. Show that the language 
L = {a"b" : n 2 0}U [a"i in >O}U {a"b"+? in > 0} 
is not regular. 


* 8. Show that the language 


is not regular. 
9. Is the language L = {w € {a,b,c} : |w| = 3na(w)} regular? 
10. Consider the language 
L = {a” : nis not a perfect square}. 


* (a) Show that this language is not regular by applying the pumping lemma directly. 
(b) Then show the same thing by using the closure properties of regular languages. 


* 11. Show that the language 
L= {a™" in >1} 
is not regular. 


12. Apply the pumping lemma directly to show the result in Example 4.12. 
13. Show that the following language is not regular. 


L= {ars in> k} U faro" nák- 1}. 


14. Prove or disprove the following statement: If L} and L, are non regular languages, then L4 U L, is 
also non regular. 


15. Consider the languages below. For each, make a conjecture whether or not it is regular. Then 
prove your conjecture. 


(a) L = {a"b'a" "n +i Fk 5) 
(b) L= {a™b'a® in > 5,1l >3,k < l}. 


a”b : n/lis an integer }. 


eU:snm<ti< 2n}. 
aT :n > 100,1< = h 
(eo) LS {a"b' :|n—l|=2 


X=) 


{ 
1 
(ay ge {a "p :n+Lis a prime number 2 
i 
1 


16. Is the following language regular? 
L = {wcw : wi, w2 € {a, bF ‚wi £ w2}. 
17. Let L, and L, be regular languages. Is the language L = {w : w € L4, w? € L, necessarily regular? 


18. Apply the pigeonhole argument directly to the language in Example 4.8. 
19. Are the following languages regular? 


(a) L = {uww"v : u,v, w E fa, b}*} 
TOL b = {uww*v : u,v, w E {a,b}* , |u] > |o|}. 
20. Is the following language regular? 
L= {ww*e :u,we {a,b} } 


21. Let P be an infinite but countable set, and associate with each p € P a language L,. The smallest 


set containing every L, is the union over the infinite set P; it will be denoted by U,€?L,. Show by 
example that the family of regular languages is not closed under infinite union. 


22. Consider the argument in Section 3.2 that the language associated with any generalized transition 
graph is regular. The language associated with such a graph is 


L= U L (rp), 


PEP 


where P is the set of all walks through the graph and r, is the expression associated with a walk 


p. The set of walks is generally infinite, so that in light of Exercise 21, it does not immediately 
follow that Z is regular. Show that in this case, because of the special nature of P, the infinite 
union is regular. 


* 23.Is the family of regular languages closed under infinite intersection? 
24.Suppose that we know that L; UL, and L} are regular. Can we conclude from this that Lis 
regular? 
25. In the chain code language in Exercise 24, Section 3.1, let L be the set of all w € u,r,/,d}* that 
describe rectangles. Show that L is not a regular language. 
26. Le Lb = {a"b™ :n > 100,m < 50}. 
(a) Can you use the pumping lemma to show that L is regular? 


(b) Can you use the pumping lemma to show that L is not regular? Explain your answers. 


o— ass |b 


27. Show that the language generated by the grammar *- is not regular. 


Chapter 5 


Context-Free 
Languages 


n the last chapter, we discovered that not all languages are regular. While regular languages 

are effective in describing certain simple patterns, one does not need to look very far for 

l examples of nonregular languages. The relevance of these limitations to programming 

languages becomes evident if we reinterpret some of the examples. If in L = {a”b” : n > 0} we 

substitute a left parenthesis for a and a right parenthesis for b, then parentheses strings such as 

(O) and ((0)) are in L, but (() is not. The language therefore describes a simple kind of nested 

structure found in programming languages, indicating that some properties of programming languages 

require something beyond regular languages. In order to cover this and other more complicated 

features we must enlarge the family of languages. This leads us to consider context-free languages 
and grammars. 


We begin this chapter by defining context-free grammars and languages, illustrating the definitions 
with some simple examples. Next, we consider the important membership problem; in particular we 
ask how we can tell if a given string is derivable from a given grammar. Explaining a sentence 
through its grammatical derivation is familiar to most of us from a study of natural languages and is 
called parsing. Parsing is a way of describing sentence structure. It is important whenever we need to 
understand the meaning of a sentence, as we do for instance in translating from one language to 
another. In computer science, this is relevant in interpreters, compilers, and other translating 
programs. 

The topic of context-free languages is perhaps the most important aspect of formal language 
theory as it applies to programming languages. Actual programming languages have many features that 
can be described elegantly by means of context-free languages. What formal language theory tells us 
about context-free languages has important applications in the design of programming languages as 
well as in the construction of efficient compilers. We touch upon this briefly in Section 5.3. 


5.1 Context-Free Grammars 


The productions in a regular grammar are restricted in two ways: The left side must be a single 
variable, while the right side has a special form. To create grammars that are more powerful, we 
must relax some of these restrictions. By retaining the restriction on the left side, but permitting 
anything on the right, we get context-free grammars. 


Definition 5.1 


A grammar G = (V, T, S, P) is said to be context-free if all productions in P have the form 
A pt G 4 


where A € V and x €(VUT). 
A language L is said to be context-free if and only if there is a context-free grammar G such that L 


=L(®). 


Every regular grammar is context-free, so a regular language is also a context-free one. But, as we 
know from simple examples such as {a”b”}, there are nonregular languages. We have already shown 
in Example 1.11 that this language can be generated by a context-free grammar, so we see that the 
family of regular languages is a proper subset of the family of context-free languages. 

Context-free grammars derive their name from the fact that the substitution of the variable on the 
left of a production can be made any time such a variable appears in a sentential form. It does not 
depend on the symbols in the rest of the sentential form (the context). This feature is the consequence 
of allowing only a single variable on the left side of the production. 


Examples of Context-Free Languages 


Example 5.1 


The grammar G=({S}, {a, b}, S, P), with productions 


S > aSa, 
S — bSb. 
S — 4, 


is context-free. A typical derivation in this grammar is 
S => aSa => aaSaa => aabSbaa => aabbaa. 
This, and similar derivations, make it clear that 
L(G)= { wwe: we {a,b}"}. 


The language is context-free, but as shown in Example 4.8, it is not regular. 


Example 5.2 


The grammar G, with productions 


S — abB, 

A — aaBb, 
B — bba, 
A— à, 


is context-free. We leave it to the reader to show that 


L(G) = {ab (bbaa \" bba (ba)” : n > 0}. 


Both of the above examples involve grammars that are not only context-free, but linear. Regular 
and linear grammars are clearly context-free, but a context-free grammar is not necessarily linear. 


Example 5.3 


The language 
L= {a™b™ ee m} 


is context-free. 


To show this, we need to produce a context-free grammar for the language. The case of n =m is 
solved in Example 1.11 and we can build on that solution. Take the case n >m. We first generate a 
string with an equal number of a's and b's, then add extra a's on the left. This is done with 


S — AS), 
Si — asıb|à, 
A — aAla. 


We can use similar reasoning for the case n < m, and we get the answer 


S — AS,|5,B, 
Sı — asıb|à, 
A =e aAla, 

B — bB\b. 


The resulting grammar is context-free, hence L is a context-free language. However, the grammar is 
not linear. 


The particular form of the grammar given here was chosen for the purpose of illustration; there 
are many other equivalent context-free grammars. In fact, there are some simple linear ones for this 
language. In Exercise 26 at the end of this section you are asked to find one of them. 


Example 5.4 


Consider the grammar with productions 


sS— aSb|SS|X. 


This is another grammar that is context-free, but not linear. Some strings in L(G) are abaabb, aababb, 
and ababab. It is not difficult to conjecture and prove that 


=i he oe j k ; . tt >» 
L = {we {a,b} :ng(w) =p (w) and ng(v) > nz (v), 


where v is any prefix of w}. (5.1) 


We can see the connection with programming languages clearly if we replace a and b with left and 
right parentheses, respectively. The language L includes such strings as (()) and O () () and is in fact 
the set of all properly nested parenthesis structures for the common programming languages. 

Here again there are many other equivalent grammars. But, in contrast to Example 5.3, it is not so 
easy to see if there are any linear ones. We will have to wait until Chapter 8 before we can answer 
this question. 


Leftmost and Rightmost Derivations 


In a grammar that is not linear, a derivation may involve sentential forms with more than one variable. 
In such cases, we have a choice in the order in which variables are replaced. Take, for example, the 
grammar G = ({A, B, S}, {a, b}, S, P) with productions 


1.S — AB. 
2. A — aad. 
3.A— À. 
4. B — Bb. 
5. B — à. 


This grammar generates the language L(G) = {a?"b” : n > 0, m > 0}. Carry out a few derivations to 
convince yourself of this. 


Consider now the two derivations 
S Ż AB 3 aaAB Š aaB 3 aaBb Š aab 
and 
S34 AB ABb Ż aaABb = aa Ab Š aab. 


In order to show which production is applied, we have numbered the productions and written the 
appropriate number on the > symbol. From this we see that the two derivations not only yield the 
same sentence but also use exactly the same productions. The difference is entirely in the order in 
which the productions are applied. To remove such irrelevant factors, we often require that the 


variables be replaced in a specific order. 
Definition 5.2 


A derivation is said to be leftmost if in each step the leftmost variable in the sentential form is 
replaced. Ifin each step the rightmost variable is replaced, we call the derivation rightmost. 


Example 5.5 


Consider the grammar with productions 
S — aAB, 
A — bBb, 
B= Ald. 
Then 
S => aAB = abBbB = abAbB = abbBbbB => abbbbB => abbbb 
is a leftmost derivation of the string abbbb. A rightmost derivation of the same string is 


S > aAB=> aA => abBb > abAb > abbBbb => abbbb. 


Derivation Trees 


A second way of showing derivations, independent of the order in which productions are used, is by 
a derivation or parse tree. A derivation tree is an ordered tree in which nodes are labeled with the 
left sides of productions and in which the children of a node represent its corresponding right sides. 
For example, Figure 5.1 shows part of a derivation tree representing the production 


A — abABc. 


In a derivation tree, a node labeled with a variable occurring on the left side of a production has 
children consisting of the symbols on the right side of that production. Beginning with the root, 
labeled with the start symbol and ending in leaves that are terminals, a derivation tree shows how 
each variable is replaced in the derivation. The following definition makes this notion precise. 


Figure 5.1 


G 0 O o 0 


Definition 5.3 


Let G=(V T S, P ) be a context-free grammar. An ordered tree is a derivation tree for G if and only 
if it has the following properties. 


1. The root is labeled S. 
2. Every leaf has a label from TU {A}. 
3. Every interior vertex (a vertex that is not a leaf) has a label from V. 


4. If a vertex has label A € V, and its children are labeled (from left to right) a), a5,..., a,, then P 
must contain a production of the form 


A — a10@2°::Qn. 


5. A leaf labeled à has no siblings, that is, a vertex with a child labeled à can have no other 
children. 


A tree that has properties 3, 4, and 5, but in which 1 does not necessarily hold and in which 
property 2 is replaced by 


2a. Every leaf has a label from VU TU {A}, 


is said to be a partial derivation tree. 


The string of symbols obtained by reading the leaves of the tree from left to right, omitting any 4’s 
encountered, is said to be the yield of the tree. The descriptive term left to right can be given a 
precise meaning. The yield is the string of terminals in the order they are encountered when the tree is 
traversed in a depth-first manner, always taking the leftmost unexplored branch. 


Example 5.6 


Consider the grammar G, with productions 


S — aAB, 
A — bBb, 
B= AJA. 
The tree in Figure 5.2 is a partial derivation tree for G, while the tree in Figure 5.3 is a derivation 


tree. The string abBbB, which is the yield of the first tree, is a sentential form of G. The yield of the 
second tree, abbbb, is a sentence of L (G). 


Figure 5.2 


Figure 5.3 
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Relation Between Sentential Forms and Derivation Trees 


Derivation trees give a very explicit and easily comprehended description of a derivation. Like 
transition graphs for finite automata, this explicitness is a great help in making arguments. First, 
though, we must establish the connection between derivations and derivation trees. 


Theorem 5.1 


Let G = (V, T S, P ) be a context-free grammar. Then for every w € L (G), there exists a derivation 
tree of G whose yield is w. Conversely, the yield of any derivation tree is in L (G). Also, if tg is any 


partial derivation tree for G whose root is labeled S, then the yield of tç is a sentential form of G. 


Proof: First we show that for every sentential form of L (G) there is a corresponding partial 
derivation tree. We do this by induction on the number of steps in the derivation. As a basis, we note 
that the claimed result is true for every sentential form derivable in one step. Since S > u implies that 
there is a production S — u, this follows immediately from Definition 5.3. 


Assume that for every sentential form derivable inn steps, there 1s a corresponding partial 
derivation tree. Now any w derivable in n + 1 steps must be such that 


SŠ zrAy, z,yE(VUT)*, AEV, 
inn steps, and 
TAY > rajaQ°--Admy = W, Q; E V UT. 


Since by the inductive assumption there is a partial derivation tree with yield x4y, and since the 
grammar must have production A — a ,da)...a,,, we see that by expanding the leaf labeled A, we get a 


partial derivation tree with yield xa)a)...a,,y =w. By induction, we therefore claim that the result is 
true for all sentential forms. 


In a similar vein, we can show that every partial derivation tree represents some sentential form. 
We will leave this as an exercise. 


Since a derivation tree is also a partial derivation tree whose leaves are terminals, it follows that 
every sentence in L (G) is the yield of some derivation tree of G and that the yield of every derivation 
tree is in L(G). 


Derivation trees show which productions are used in obtaining a sentence, but do not give the 
order of their application. Derivation trees are able to represent any derivation, reflecting the fact that 
this order is irrelevant, an observation that allows us to close a gap in the preceding discussion. By 
definition, any w € L(G) has a derivation, but we have not claimed that it also had a leftmost or 
rightmost derivation. However, once we have a derivation tree, we can always get a leftmost 
derivation by thinking of the tree as having been built in such a way that the leftmost variable in the 
tree was always expanded first. Filling in a few details, we are led to the not surprising result that 
any w € L(G) has a leftmost and a rightmost derivation (for details, see Exercise 25 at the end of this 
section). 


EXERCISES 


1. Complete the arguments in Example 5.2, showing that the language given is generated by the 
grammar. 


2. Draw the derivation tree corresponding to the derivation in Example 5.1. 


3. Give a derivation tree for w = abbbaabbaba for the grammar in Example 5.2. Use the derivation 
tree to find a leftmost derivation. 


4. Show that the grammar in Example 5.4 does in fact generate the language described in Equation 
al 


5. Is the language in Example 5.2 regular? 


6. Complete the proof in Theorem 5.1 by showing that the yield of every partial derivation tree with 
root Sis a sentential form of G. 


7. Find context-free grammars for the following languages (with n > 0, m= 0). 
(a) L= {a"b":n<m+3}. 
(b) L = {a"b”:nxm-— 1}. 
(c) L= {a"b":nF2m}. 
(d) L = {a"b”: 2n <m< 3n}. 
(e) L= {we {a, b}* : na (w) £ n (W)}. 
(A L = {we fa, b}" : n, (v) > n, (v), where v is any prefix of w}. 
(g) L = {we {a,b} : ng (wW) = 2n (w) +1}. 
8. Find context-free grammars for the following languages (with n > 0, m > 0, k > 0). 
(a) L= {a"b"ck :n=morm< k}. 
(b) L= fa"b"ck :n=mor mtk}. 
(c) L= {a"b"c :k=n+m}. 
(d) L= {a"b"ck :n+2m=k}. 
(e) L = {a"b"c* : k= jn- mh}. 
(f) L= {we {a, b, c}* : na (W) + mp (W) £ ne (W)}. 
(9) L= fa"b"ck, k#n+my\. 
(h) L = fa"b"c* : k> 3}. 
9. Show that L = {w € {a,b,c}" : |w| = 3n,(w)} is a context-free language. 


10. Find a context-free grammar for head (L), where L is the language in Exercise 7(a) above. For 
the definition of head see Exercise 18, Section 4.1. 


11. Find a context-free grammar for È = {a, b} for the language L = {a"ww*b" : we x", n> 1}. 


*12. Given a context-free grammar G for a language L, show how one can create from G a grammar 


G go that L ( 3) = head (L). 


13. Let L = {a"b" : n > 0}. 
(a) Show that L? is context-free. 


(b) Show that L* is context-free for any given k> 1. 


(c) Show that L and L* are context-free. 


14. Let L, be the language in Exercise 8(a) and L, the language in Exercise 8(d). Show that L, U L, is 
a context-free language. 


15. Show that the following language is context-free. 
L = fuvwv” Uv, wE fa, b}* a jul = jw] = 2} ; 


*16. Show that the complement of the language in Example 5.1 is context-free. 


17. Show that the complement of the language in Exercise 8(c) is context-free. 


— f ’ OTEP . } ti eS + ’ H+ RY 
18. Show that the ET | = {W1CwW2 : W1, W2 € abr w Ew 


{a,b,c} ,is context-free. 


19. Show a derivation tree for the string aabbbb with the grammar 


S — ABA, 
A — aB, 
B E Sb. 


Give a verbal description of the language generated by this grammar. 


20. Consider the grammar with productions 


S 9 aaB. 
A — bBDIA, 
B — Aa. 


Show that the string aabbabba is not in the language generated by this grammar. 


21. Consider the derivation tree below. 


> O 
Da 


Find a grammar G for which this is the derivation tree of the string aab. Then find two more 
sentences of L(G). Find a sentence in L(G) that has a derivation tree of height five or larger. 


22. Define what one might mean by properly nested parenthesis structures involving two kinds of 
parentheses, say () and []. Intuitively, properly nested strings in this situation are ({]), ([[]])[O], 
but not ([)] or ((]]. Using your definition, give a context-free grammar for generating all properly 
nested parentheses. 


23. Find a context-free grammar for the set of all regular expressions on the alphabet {a, b}. 


24. Find a context-free grammar that can generate all the production rules for context-free grammars 
with T= {a, b} and V= {A, B, C}. 

25. Prove that if G is a context-free grammar, then every w € L(G) has a leftmost and rightmost 
derivation. Give an algorithm for finding such derivations from a derivation tree. 

26. Find a linear grammar for the language in Example 5.3. 


27. Let G = (V, T,S,P) be a context-free grammar such that every one of its productions is of the form 
A — v, with | = k > 1. Show that the derivation tree for any w € L(G) has a height A such that 
( Jew | — 1) 


k-1 


log, |u| < h < 


Bo Parsing and Ambiguity 


We have so far concentrated on the generative aspects of grammars. Given a grammar G, we studied 
the set of strings that can be derived using G. In cases of practical applications, we are also 
concerned with the analytical side of the grammar: Given a string w of terminals, we want to know 
whether or not w is in L(G). If so, we may want to find a derivation of w. An algorithm that can tell us 
whether w is in L(G) is a membership algorithm. The term parsing describes finding a sequence of 
productions by which a w € L(G) is derived. 


Parsing and Membership 


Given a string w in L(G), we can parse it in a rather obvious fashion: We systematically construct all 


possible (say, leftmost) derivations and see whether any of them match w. Specifically, we start at 
round one by looking at all productions of the form 


S— x, 


finding all x that can be derived from S in one step. If none of these results in a match with w, we go 
to the next round, in which we apply all applicable productions to the leftmost variable of every x. 
This gives us a set of sentential forms, some of them possibly leading to w. On each subsequent round, 
we again take all leftmost variables and apply all possible productions. It may be that some of these 
sentential forms can be rejected on the grounds that w can never be derived from them, but in general, 
we will have on each round a set of possible sentential forms. After the first round, we have 
sentential forms that can be derived by applying a single production, after the second round we have 
the sentential forms that can be derived in two steps, and so on. Ifw e L(G), then it must have a 
leftmost derivation of finite length. Thus, the method will eventually give a leftmost derivation of w. 


For reference below, we will call this exhaustive search parsing or brute force parsing. It is a 
form of top-down parsing, which we can view as the construction of a derivation tree from the root 
down. 


Example 5.7 


Consider the grammar 
S33 SS la.Sb| bSa|A 


and the string w = aabb. Round one gives us 


1. S = ss 
2..S = aSb, 
3.5 => bsa: 
4.8 A. 


The last two of these can be removed from further consideration for obvious reasons. Round two then 
yields sentential forms 


S > SS > S55. 
S > SS => aSbS, 
LS ae fe Bee le el 
S > SSS. 


which are obtained by replacing the leftmost S in sentential form 1 with all applicable substitutes. 
Similarly, from sentential form 2 we get the additional sentential forms 


S => aSb=> aSSb. 


S => aSb= aaSbb. 
S => aSb=> abSab. 
S => aSb= ab. 


Again, several of these can be removed from contention. On the next round, we find the actual target 
string from the sequence 


S => aSb=> aaSbb => aabb. 


Therefore, aabb is in the language generated by the grammar under consideration. 


Exhaustive search parsing has serious flaws. The most obvious one is its tediousness; it is not to 
be used where efficient parsing is required. But even when efficiency is a secondary issue, there is a 
more pertinent objection. While the method always parses aw e€ L(G), it is possible that it never 
terminates for strings not in L(G). This is certainly the case in the previous example; with w = abb, 
the method will go on producing trial sentential forms indefinitely unless we build into it some way of 
stopping. 

The problem of nontermination of exhaustive search parsing is relatively easy to overcome if we 
restrict the form that the grammar can have. If we examine Example 5.7, we see that the difficulty 
comes from the productions S — i; this production can be used to decrease the length of successive 
sentential forms, so that we cannot tell easily when to stop. If we do not have any such productions, 
then we have many fewer difficulties. In fact, there are two types of productions we want to rule out, 
those of the form A — i as well as those of the form A — B. As we will see in the next chapter, this 
restriction does not affect the power of the resulting grammars in any significant way. 


Example 5.8 


The grammar 
S — SS|aSb| bSa|ab| ba 


satisfies the given requirements. It generates the language in Example 5.7 without the empty string. 


Given any w € {a,b}", the exhaustive search parsing method will always terminate in no more 
than |w| rounds. This is clear because the length of the sentential form grows by at least one symbol in 


w € L(G). 


each round. After |w| rounds we have either produced a parsing or we know that 


The idea in this example can be generalized and made into a theorem for context-free languages in 
general. 


Theorem 5.2 


Suppose that G = (V, T, S, P) is a context-free grammar that does not have any rules of the form 


A>), 


A—> B, 


where A, B e V. Then the exhaustive search parsing method can be made into an algorithm which, for 
any w € È“, either produces a parsing of w or tells us that no parsing is possible. 


Proof: For each sentential form, consider both its length and the number of terminal symbols. Each 
step in the derivation increases at least one of these. Since neither the length of a sentential form nor 
the number of terminal symbols can exceed |w|, a derivation cannot involve more than 2|w| rounds, at 
which time we either have a successful parsing or w cannot be generated by the grammar. m 


While the exhaustive search method gives a theoretical guarantee that parsing can always be done, 
its practical usefulness is limited because the number of sentential forms generated by it may be 
excessively large. Exactly how many sentential forms are generated differs from case to case; no 
precise general result can be established, but we can put some rough upper bounds on it. If we restrict 
ourselves to leftmost derivations, we can have no more than |P| sentential forms after one round, no 
more than |PÊ sentential forms after the second round, and so on. In the proof of Theorem 5.2, we 
observed that parsing cannot involve more than 2|w| rounds; therefore, the total number of sentential 
forms cannot exceed 
ac 4 emake | iil 
= O ple|+1 j. in 


M 


II 


2) 
This indicates that the work for exhaustive search parsing may grow exponentially with the length of 
the string, making the cost of the method prohibitive. Of course, Equation (5.2) is only a bound, and 
often the number of sentential forms is much smaller. Nevertheless, practical observation shows that 
exhaustive search parsing is very inefficient in most cases. 


The construction of more efficient parsing methods for context-free grammars is a complicated 
matter that belongs to a course on compilers. We will not pursue it here except for some isolated 
results. 


Theorem 5.3 


For every context-free grammar there exists an algorithm that parses any w € L(G) in a number of 
steps proportional to |w}. 

There are several known methods to achieve this, but all of them are sufficiently complicated that 
we cannot even describe them without developing some additional results. In Section 6.3 we will 
take this question up again briefly. More details can be found in Harrison 1978 and Hopcroft and 
Ullman 1979. One reason for not pursuing this in detail is that even these algorithms are 


unsatisfactory. A method in which the work rises with the third power of the length of the string, 
while better than an exponential algorithm, is still quite inefficient, and a parser based on it would 
need an excessive amount of time to analyze even a moderately long program. What we would like to 
have is a parsing method that takes time proportional to the length of the string. We refer to such a 
method as a linear time parsing algorithm. We do not know any linear time parsing methods for 
context-free languages in general, but such algorithms can be found for restricted, but important, 
special cases. 


Definition 5.4 


A context-free grammar G = (V T; S, P ) is said to be a simple grammar or s-grammar if all its 
productions are of the form 


A— ax, 


where A € V, a € T, x € V“, and any pair (A, a) occurs at most once in P. 


Example 5.9 


The grammar 
S — aS |bSS|c 
is an s-grammar. The grammar 
S — aS |bSS|aSS\c 


is not an s-grammar because the pair (S, a) occurs in the two productions S — aS and S — aSS. 


While s-grammars are quite restrictive, they are of some interest. As we will see in the next 
section, many features of common programming languages can be described by s-grammars. 


If G is an s-grammar, then any string w in L(G) can be parsed with an effort proportional to |w|. To 
see this, look at the exhaustive search method and the string w = a,a5...a,. Since there can be at most 


one rule with S on the left, and starting with a, on the right, the derivation must begin with 
=> a ,A,A9 däi A 

Next, we substitute for the variable 44, but since again there is at most one choice, we must have 
S Š ayaoB,Bo--- Ag+: Am- 


We see from this that each step produces one terminal symbol and hence the whole process must be 
completed in no more than |w] steps. 


Ambiguity in Grammars and Languages 


On the basis of our argument we can claim that given any w € L(G), exhaustive search parsing will 
produce a derivation tree for w. We say “a” derivation tree rather than “the” derivation tree because 
of the possibility that a number of different derivation trees may exist. This situation is referred to as 
ambiguity. 


Definition 5.5 


A context-free grammar G is said to be ambiguous if there exists some w € L(G) that has at least 
two distinct derivation trees. Alternatively, ambiguity implies the existence of two or more leftmost 
or rightmost derivations. 


Example 5.10 


The grammar in Example 5.4, with productions S — aSb|SS|A, is ambiguous. The sentence aabb has 
the two derivation trees shown in Figure 5.4. 


Figure 5.4 


Ambiguity is a common feature of natural languages, where it is tolerated and dealt with in a 
variety of ways. In programming languages, where there should be only one interpretation of each 
statement, ambiguity must be removed when possible. Often we can achieve this by rewriting the 
grammar in an equivalent, unambiguous form. 


Example 5.11 


Consider the grammar G = (V, T, E, P) with 


V = {EI}, 
T = {a,b,c,+,%, (,)}, 
and productions 
E — l, 
E= E+E, 
E = E*E, 
E — (E), 
I —> a|ble. 


The strings (a + b)*c and a*b + c are in L(G). It is easy to see that this grammar generates a restricted 
subset of arithmetic expressions for C-like programming languages. The grammar is ambiguous. For 
instance, the string a + b*c has two different derivation trees, as shown in Figure 5.5. 


Figure 5.5 


Two derivation trees for a + b*c. 
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(a) (b) 


One way to resolve the ambiguity is, as is done in programming manuals, to associate precedence 
rules with the operators + and *. Since * normally has higher precedence than +, we would take 
Figure 5.5(a) as the correct parsing as it indicates that b*c 1s a subexpression to be evaluated before 
performing the addition. However, this resolution is completely outside the grammar. It is better to 
rewrite the grammar so that only one parsing is possible. 


Example 5.12 


To rewrite the grammar in Example 5.11 we introduce new variables, taking V as {F, T F I}, and 
replacing the productions with 


E = T, 

T — F, 
FT, 

E — E+T, 
T — T * F, 
F — (E), 

I —> a |b| Ce 


A derivation tree of the sentence a + b * c is shown in Figure 5.6. No other derivation tree is possible 
for this string: The grammar is unambiguous. It is also equivalent to the grammar in Example 5.11. It 
is not too hard to justify these claims in this specific instance, but, in general, the questions of whether 
a given context-free grammar is ambiguous or whether two given context-free grammars are 
equivalent are very difficult to answer. In fact, we will later show that there are no general algorithms 
by which these questions can always be resolved. 


Figure 5.6 


In the foregoing example the ambiguity came from the grammar in the sense that it could be 
removed by finding an equivalent unambiguous grammar. In some instances, however, this is not 
possible because the ambiguity is in the language. 


Definition 5.6 


If L is a context-free language for which there exists an unambiguous grammar, then L is said to be 
unambiguous. If every grammar that generates L is ambiguous, then the language is called inherently 
ambiguous. 


It is a somewhat difficult matter even to exhibit an inherently ambiguous language. The best we 
can do here is give an example with some reasonably plausible claim that it is inherently ambiguous. 


Example 5.13 


The language 
— Tp Er } U fa b™ em I. 


with n and m nonnegative, is an inherently ambiguous context-free language. 
That L is context-free is easy to show. Notice that 


L= Li U La, 
where L, is generated by 
Sı — SyelA, 
A — aAblr 
and L, is given by an analogous grammar with start symbol S, and productions 
SaB, 
B > bBel. 
Then L is generated by the combination of these two grammars with the additional production 
S — S4|Sp. 


The grammar is ambiguous since the string a”b”c” has two distinct derivations, one starting with 
S = S1, the other with S = S2. It does not, of course, follow from this thatZ is inherently 
ambiguous as there might exist some other unambiguous grammars for it. But in some way L, and L, 
have conflicting requirements, the first putting a restriction on the number of a’s and b’s, while the 
second does the same for b’s and c’s. A few tries will quickly convince you of the impossibility of 
combining these requirements in a single set of rules that cover the case n =m uniquely. A rigorous 
argument, though, is quite technical. One proof can be found in Harrison 1978. 


EXERCISES 


. Find an s-grammar for L (aaa*b + b). 
. Find an s-grammar for L = {a"b" : n= 1}. 
. Find an s-grammar for L = {a"b"*! :n>2}. 


. Show that every s-grammar is unambiguous. 


n A WO N m 


. Let G= (V, T, S, P) be an s-grammar. Give an expression for the maximum size of P in terms of |V| 
and |T]. 


6. Show that the following grammar is ambiguous. 


S — ABlaaB, 
A = alAa, 
B— b. 
7. Construct an unambiguous grammar equivalent to the grammar in Exercise 6. 
8. Give the derivation tree for (((a + b) * c)) + a + b, using the grammar in Example 5.12. 
9. Show that a regular language cannot be inherently ambiguous. 
10. Give an unambiguous grammar that generates the set of all regular expressions on È = {a,b}. 
11. Is it possible for a regular grammar to be ambiguous? 
12. Show that the language L = {ww* : we {a,b}"} is not inherently ambiguous. 


13. Show that the following grammar is ambiguous. 


S — aSbS |bSaS|X. 


14. Show that the grammar in Example 5.4 is ambiguous, but that the language denoted by it 1s not. 
15. Show that the grammar in Example 1.13 is ambiguous. 
16. Show that the grammar in Example 5.5 is unambiguous. 


17. Use the exhaustive search parsing method to parse the string abbbbbb with the grammar in 
Example 5.5. In general, how many rounds will be needed to parse any string w in this language? 


18. Is the string aabbababb in the language generated by the grammar S — aSS|b? 
19. Show that the grammar in Example 1.14 is unambiguous. 


20. Prove the following result. Let G = (V T S, P ) be a context-free grammar in which every A € V 
occurs on the left side of at most one production. Then G is unambiguous. 


21. Find a grammar equivalent to that in Example 5.5 that satisfies the conditions of Theorem 5.2. 


5.3 Context-Free Grammars and Programming Languages 


One of the most important uses of the theory of formal languages is in the definition of programming 
languages and in the construction of interpreters and compilers for them. The basic problem here is to 
define a programming language precisely and to use this definition as the starting point for the writing 
of efficient and reliable translation programs. Both regular and context-free languages are important 
in achieving this. As we have seen, regular languages are used in the recognition of certain simple 
patterns that occur in programming languages, but as we argue in the introduction to this chapter, we 
need context-free languages to model more complicated aspects. 


As with most other languages, we can define a programming language by a grammar. It is 
traditional in writing on programming languages to use a convention for specifying grammars called 
the Backus-Naur form or BNF. This form is in essence the same as the notation we have used here, 
but the appearance is different. In BNF, variables are enclosed in triangular brackets. Terminal 
symbols are written without any special marking. BNF also uses subsidiary symbols such as |, much 
in the way we have done. Thus, the grammar in Example 5.12 might appear in BNF as 


(expression) ::= (term) | (expression) + (term) , 


(te rm) ::= (factor) | (term) 4 (factor) , 


and so on. The symbols + and * are terminals. The symbol | is used as an alternator as in our notation, 
but ::= is used instead of —. BNF descriptions of programming languages tend to use more explicit 
variable identifiers to make the intent of the production explicit. But otherwise there are no significant 
differences between the two notations. 


Many parts of C-like programming languages are susceptible to definition by restricted forms of 
context-free grammars. For example, the while statement in C can be defined as 


(while_statement) ::= while(expression) (statement). 


Here the keyword while is a terminal symbol. All other terms are variables, which still have to be 
defined. If we check this against Definition 5.4, we see that this looks like an s-grammar production. 
The variable \ on the left is always associated with the terminal while on 
the right. For this reason such a statement is easily and efficiently parsed. We see here a reason why 
we use keywords in programming languages. Keywords not only provide some visual structure that 
can guide the reader of a program, but also make the work of a compiler much easier. 


while_statement) 


Unfortunately, not all features of a typical programming language can be expressed by an s- 


grammar. The rules for \~“ ‘PVESSLON) above are not of this type, so that parsing becomes less 


obvious. The question then arises what grammatical rules we can permit and still parse efficiently. In 
compilers, extensive use has been made of what are called LL and LR grammars. These grammars 
have the ability to express the less obvious features of a programming language, yet allow us to parse 
in linear time. This is not a simple matter, and much of it is beyond the scope of our discussion. We 
will briefly touch on this topic in Chapter 6, but for our purposes it suffices to realize that such 
grammars exist and have been widely studied. 


In connection with this, the issue of ambiguity takes on added significance. The specification of a 


programming language must be unambiguous, otherwise a program may yield very different results 
when processed by different compilers or run on different systems. As Example 5.11 shows, a naive 
approach can easily introduce ambiguity in the grammar. To avoid such mistakes we must be able to 
recognize and remove ambiguities. A related question is whether a language is or is not inherently 
ambiguous. What we need for this purpose are algorithms for detecting and removing ambiguities in 
context-free grammars and for deciding whether or not a context-free language is inherently 
ambiguous. Unfortunately, these are very difficult tasks, impossible in the most general sense, as we 
will see later. 


Those aspects of a programming language that can be modeled by a context-free grammar are 
usually referred to as its syntax. However, it is normally the case that not all programs that are 
syntactically correct in this sense are in fact acceptable programs. For C, the usual BNF definition 
allows constructs such as 


char a, b, c; 
followed by 
C= 32) 


This combination is not acceptable to C compilers since it violates the constraint, “a character 
variable cannot be assigned a real value.” Context-free grammars cannot express the fact that type 
clashes may not be permitted. Such rules are part of programming language semantics, since they have 
to do with how we interpret the meaning of a particular construct. 


Programming language semantics are a complicated matter. Nothing as elegant and concise as 
context-free grammars exists for the specification of programming language semantics, and 
consequently some semantic features may be poorly defined or ambiguous. It is an ongoing concern 
both in programming languages and in formal language theory to find effective methods for defining 
programming language semantics. Several methods have been proposed, but none of them has been as 
universally accepted and are as successful for semantic definition as context-free languages have 
been for syntax. 


EXERCISES 


1. Consult a book on C for formal defintions of the following constructs. 
(a) literal 
(b) for statement 
(c) if-else statement 
(d) do statement 
(e) compound statement 


(f) return statement 


2. Find examples of features of C that cannot be described by context-free grammars. 


Chapter 6 


Simplification of 
Context-Free 
Grammars and 
Normal Forms 


efore we can study context-free languages in greater depth, we must attend to some 
technical matters. The definition of a context-free grammar imposes no restriction 
whatsoever on the right side of a production. However, complete freedom is not necessary 
and, in fact, is a detriment in some arguments. In Theorem 5.2, we see the convenience of 
certain restrictions on grammatical forms; eliminating rules of the form A — à and A— B 
make the arguments easier. In many instances, it is desirable to place even more stringent restrictions 
on the grammar. Because of this, we need to look at methods for transforming an arbitrary context- 
free grammar into an equivalent one that satisfies certain restrictions on its form. In this chapter we 
study several transformations and substitutions that will be useful in subsequent discussions. 


We also investigate normal forms for context-free grammars. A normal form is one that, although 
restricted, is broad enough so that any grammar has an equivalent normal-form version. We introduce 
two of the most useful of these, the Chomsky normal form and the Greibach normal form. Both 
have many practical and theoretical uses. An immediate application of the Chomsky normal form to 
parsing is given in Section 6.3. 


The somewhat tedious nature of the material in this chapter lies in the fact that many of the 
arguments are manipulative and give little intuitive insight. For our purposes, this technical aspect is 
relatively unimportant and can be read casually. The various conclusions are significant; they will be 
used many times in later discussions. 


6.1 Methods for Transforming Grammars 


We first raise an issue that is somewhat of a nuisance with grammars and languages in general: the 
presence of the empty string. The empty string plays a rather singular role in many theorems and 
proofs, and it is often necessary to give it special attention. We prefer to remove it from consideration 
altogether, looking only at languages that do not contain i. In doing so, we do not lose generality, as 
we see from the following considerations. Let L be any context-free language, and let G = (V, T S, P ) 
be a context-free grammar for L — {A}. Then the grammar we obtain by adding to V the new variable 
So, making Sp the start variable, and adding to P the productions 


So => SA 


generates L. Therefore, any nontrivial conclusion we can make for L — {A} will almost certainly 
transfer to L. Also, given any context-free grammar G, there is a method for obtaining “ such that 


L (ê) = L(G) — i} (see Exercises 13 and 14 at the end of this section). Consequently, for all 
practical purposes, there is no difference between context-free languages that include À and those that 
do not. For the rest of this chapter, unless otherwise stated, we will restrict our discussion to A-free 
languages. 


A Useful Substitution Rule 


Many rules govern generating equivalent grammars by means of substitutions. Here we give one that 
is very useful for simplifying grammars in various ways. We will not define the term simplification 
precisely, but we will use it nevertheless. What we mean by it is the removal of certain types of 
undesirable productions; the process does not necessarily result in an actual reduction of the number 
of rules. 


Theorem 6.1 


Let G= (V, T, S, P) be a context-free grammar. Suppose that P contains a production of the form 
A-x 1 BX». 
Assume that A and B are different variables and that 


B > yi Vol---Wn 


is the set of all productions in P that have B as the left side. Let G = (V, T, S, P) be the grammar in 
which P is constructed by deleting 


A — z1 Br2 (6.1) 
from P, and adding to it 
A — T1Y1T2 |£1Y2T2| - -- |L1Yn Xe. 
Then 
L (ê) = L(G). 
Proof: Suppose that w € L (G), so that 
Sew. 


The subscript on the derivation sign > is used here to distinguish between derivations with different 


grammars. If this derivation does not involve the production (6.1), then obviously 


S> w. 


ty 


If it does, then look at the derivation the first time (6.1) is used. The B so introduced eventually has to 
be replaced; we lose nothing by assuming that this is done immediately (see Exercise 18 at the end of 
this section). Thus 


S = G uyA Ug =g Ur, Brou =g U,r, Yj Taug. 
But with grammar € we can get 


~ * f s ; 
o> G uj A ug => G UiT Yj; LQ ud. 


Thus we can reach the same sentential form with G and ©. If (6.1) is used again later, we can repeat 
the argument. It follows then, by induction on the number of times the production is applied, that 


Theretore df - L(G), then u - L(G) 


By similar reasoning, we can show that if w € L (@) then w € L (G), completing the proof. m 
ET, 


Theorem 6.1 is a simple and quite intuitive substitution rule: A production A — x, Bx, can be 


eliminated from a grammar if we put in its place the set of productions in which B is replaced by all 
strings it derives in one step. In this result, it is necessary that A and B be different variables. The 
case when A = B is partially addressed in Exercises 23 and 24 at the end of this section. 


Example 6.1 


Consider G = ({A, B}, {a,b,c}, A, P) with productions 


A — alaaA|abBe, 
B — abbAlb. 


Using the suggested substitution for the variable B, we get the grammar G with productions 


A — alaaA| ababbAc|abbc, 
B — abbAlb. 


The new grammar G is equivalent to G. The string aaabbc has the derivation 


A= aaA => aaabBc = aaabbc 
in G, and the corresponding derivation 
A= aaA = aaabbe 
in 
Notice that, in this case, the variable B and its associated productions are still in the grammar 


even though they can no longer play a part in any derivation. We will next show how such 
unnecessary productions can be removed from a grammar. 


Removing Useless Productions 


One invariably wants to remove productions from a grammar that can never take part in any 
derivation. For example, in the grammar whose entire production set is 


S — aSb|A\ A, 
A — aA, 


the production S — A clearly plays no role, as A cannot be transformed into a terminal string. While 4 
can occur in a string derived from S, this can never lead to a sentence. Removing this production 
leaves the language unaffected and is a simplification by any definition. 


Definition 6.1 


Let G = (V, T, S, P) be a context-free grammar. A variable A € V is said to be useful if and only if 
there is at least one œ € L (G) such that 


~ + + . 
S rAy=> wv, (6.2) 


withx, y in (VUT) In words, a variable is useful if and only if it occurs in at least one derivation. 
A variable that is not useful is called useless. A production is useless if it involves any useless 
variable. 


Example 6.2 


A variable may be useless because there is no way of getting a terminal string from it. The case just 
mentioned is of this kind. Another reason a variable may be useless is shown in the next grammar. In 
a grammar with start symbol S and productions 


S rä; 
A — aAl|A, 
B => 'bA, 


the variable B is useless and so is the production B — bA. Although B can derive a terminal string, 
there is no way we can achieve 5 = x By, 


This example illustrates the two reasons why a variable is useless: either because it cannot be 
reached from the start symbol or because it cannot derive a terminal string. A procedure for removing 
useless variables and productions is based on recognizing these two situations. Before we present the 
general case and the corresponding theorem, let us look at another example. 


Example 6.3 


Eliminate useless symbols and productions from G = (V,7,S,P), where V = {S, A, B, C} and T= {a, 
b}, with P consisting of 


S + aS |A| C, 
A — a, 

—> Gd, 
C — aCb. 


First, we identify the set of variables that can lead to a terminal string. Because A — a and B > 
aa, the variables A and B belong to this set. So does S, because S > A > a. However, this argument 
cannot be made for C, thus identifying it as useless. Removing C and its corresponding productions, 
we are led to the grammar G, with variables V; = {S, A, B}, terminals T= {a}, and productions 


S wee a S|A, 
A Sie. 
B — aa. 


Next we want to eliminate the variables that cannot be reached from the start variable. For this, 
we can draw adependency graph for the variables. Dependency graphs are a way of visualizing 
complex relationships and are found in many applications. For context-free grammars, a dependency 
graph has its vertices labeled with variables, with an edge between vertices C and D if and only if 
there is a production of the form 


C —> xD. 


A dependency graph for V} is shown in Figure 6.1. A variable is useful only if there is a path from the 


vertex labeled S to the vertex labeled with that variable. In our case, Figure 6.1 shows that B is 
useless. Removing it and the affected productions and terminals, we are led to the final answer 


~ ~~ ~ 


G= (vn, S, P) with V = {5, A}, Î = {a} 
, and productions 


S — aS|A, 
A— a. 
The formalization of this process leads to a general construction and the corresponding theorem. 


Figure 6.1 


Theorem 6.2 


LetG = (K T S, P) be a context-free grammar. Then there exists an equivalent grammar 


6G = (V,f,5, P) | | | 
that does not contain any useless variables or productions. 


Proof: The grammar G can be generated from G by an algorithm consisting of two parts. In the first 
part we construct an intermediate grammar G, = (V;, T>, S, P) such that V, contains only variables A 


for which 


A>weT 


is possible. The steps in the algorithm are 


1. Set V} to Ø. 


2. Repeat the following step until no more variables are added to V}. For every A e V for which P 
has a production of the form 


A— 212%9:::2,, with al-s; in V; UT, 


add A to V}. 
3. Take P} as all the productions in P whose symbols are all in (V4 U T). 


Clearly this procedure terminates. It is equally clear that ifA € V, then A S+weT* isa 
possible derivation with G,. The remaining issue is whether every A for which A > w = ab--- is 
added to V, before the procedure terminates. To see this, consider any such A and look at the partial 
derivation tree corresponding to that derivation (Figure 6.2). At level k, there are only terminals, so 


every variable A; at level k — 1 will be added to V} on the first pass through Step 2 of the algorithm. 
Any variable at level k — 2 will then be added to V} on the second pass through Step 2. The third time 


through Step 2, all variables at level k — 3 will be added, and so on. The algorithm cannot terminate 
while there are variables in the tree that are not yet in V4. Hence A will eventually be added to V;. 


Figure 6.2 
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In the second part of the construction, we get the final answer G from G,. We draw the variable 
dependency graph for G, and from it find all variables that can not be reached from S. These are 
removed from the variable set, as are the productions involving them. We can also eliminate any 

| | | â= (?,î,s,P). 
terminal that does not occur in some useful production. The result is the grammar . 

Because of the construction, G does not contain any useless symbols or productions. Also, for 
each œ € L (G) we have a derivation 


E. * 
S > rÅy > w. 


Since the construction of retains A and all associated productions, we have everything needed to 
make the derivation 


y * 3 K -a PE 
D => Gt A . G Ww. 


The grammar © is constructed from G by the removal of productions, so that? © P. 


L (â) CL(G@.. . n 
Consequently / T Putting the two results together, we see that G and “ are equivalent. 
E 


Removing -Productions 


One kind of production that is sometimes undesirable is one in which the right side is the empty 
string. 


Definition 6.2 


Any production of a context-free grammar of the form 
A>) 
is called a A-production. Any variable A for which the derivation 
ASA (6.3) 


is possible is called nullable. 


A grammar may generate a language not containing A, yet have some -productions or nullable 
variables. In such cases, the -productions can be removed. 


Example 6.4 


Consider the grammar 


S — aS}b, 
Sı — aS ,b)A, 


with start variable S. This grammar generates the -free language {a”b” : n > 1}. The A-production S; 
— i can be removed after adding new productions obtained by substituting à for S4 where it occurs 
on the right. Doing this we get the grammar 


S — aSıb|ab, 
Sı — aSıb|ab. 


We can easily show that this new grammar generates the same language as the original one. 


In more general situations, substitutions for -productions can be made in a similar, although more 
complicated, manner. 


Theorem 6.3 


Let G be any context-free grammar with à not in L (G). Then there exists an equivalent grammar G 
having no A-productions. 


Proof: We first find the set Vy of all nullable variables of G, using the following steps. 
1. For all productions A — A, put A into Vy. 


2. Repeat the following step until no further variables are added to Vy. 


For all productions 
Bo Ay Ap. : Ap» 


where A, Ap,..., A, are in Vy, put B into Vy. 
Once the set Vy has been found, we are ready to construct P. To do so, we look at all productions in 
P of the form 
A— tity e tmm >, 


where each %i © V UT’, For each such production of P, we put into P that production as well as all 
those generated by replacing nullable variables with 4 in all possible combinations. For example, if 


x; and x; are both nullable, there will be one production in P with x; replaced with à, one in which x; 
is replaced with A, and one in which both x; and x; are replaced with à. There is one exception: If all 


x, are nullable, the production A — À is not put into P. 


The argument that this grammar G js equivalent to G is straightforward and will be left to the 
reader. m 


Example 6.5 


Find a context-free grammar without à-productions equivalent to the grammar defined by 


S — ABaC, 
A — BC, 

B — bd, 

C — Dj, 
D — d. 


From the first step of the construction in Theorem 6.3, we find that the nullable variables are A, B, C. 
Then, following the second step of the construction, we get 


S — ABaC |BaC| AaC |ABa| aC |Aa| Bala, 
A — B|C| BC, 

B =b 

C >D, 


D—d. 


Removing Unit-Productions 


As we have seen in Theorem 5.2, productions in which both sides are a single variable are at times 
undesirable. 


Definition 6.3 


Any production of a context-free grammar of the form 
A—>B, 


where A, B € V, is called a unit-production. 


To remove unit-productions, we use the substitution rule discussed in Theorem 6.1. As the 
construction in the next theorem shows, this can be done if we proceed with some care. 


Theorem 6.4 


Let G = (V, T, S, P ) be any context-free grammar without A-productions. Then there exists a context- 


= (V, T, 5,P) | | SOn 
free grammar / that does not have any unit-productions and that is equivalent to G. 


Proof: Obviously, any unit-production of the form A — A can be removed from the grammar without 
effect, and we need only consider A — B, where A and B are different variables. At first sight, it may 
seem that we can use Theorem 6.1 directly with x, =x, = à to replace 


A-B 
with 
A— yi \y2|---Vn- 


But this will not always work; in the special case 


A — B, 
B — A, 


the unit-productions are not removed. To get around this, we first find, for each A, all variables B 
such that 


n= 
|» 
z 


(6.4) 
We can do this by drawing a dependency graph with an edge (C, D) when-ever the grammar has a 
unit-production C — D; then (6.4) holds whenever there is a walk between A and B. The new 


grammar G is generated by first putting into P all non-unit productions of P. Next, for all A and B 
satisfying (6.4), we add to P 


A => yi Pale- Wns 


where B — y4 ),]...[,, is the set of all rules in P with B on the left. Note that since B > Yi Wl--- D7, iS 
taken from P , none of the y; can be a single variable, so that no unit-productions are created by the 
last step. 


To show that the resulting grammar is equivalent to the original one, we can follow the same line 
of reasoning as in Theorem 6.1. m 


an 
Example 6.6 
Remove all unit-productions from 
S — AalB, 
B — A|bb, 
A — albe|B. 


The dependency graph for the unit-productions is given in Figure 6.3; we see from it that 


S= A,S=> B,B = A,and A= B Hence, we add to the original non-unit productions 


S — Aa, 
A — albe, 
B — bb, 
Figure 6.3 
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the new rules 


S — a|be| bb, 
A — bb, 
B — albe, 
to obtain the equivalent grammar 
S — a|be| bb| Aa, 
A — a|bb| be, 
B — a|bb| be. 


Note that the removal of the unit-productions has made B and the associated productions useless. 


We can put all these results together to show that grammars for context-free languages can be 
made free of useless productions, A-productions, and unit-productions. 


Theorem 6.5 


Let L be a context-free language that does not contain à. Then there exists a context-free grammar that 
generates L and that does not have any useless productions, A-productions, or unit-productions. 


Proof: The procedures given in Theorems 6.2, 6.3, and 6.4 remove these kinds of productions in turn. 
The only point that needs consideration is that the removal of one type of production may introduce 
productions of another type; for example, the procedure for removing A-productions can create new 
unit-productions. Also, Theorem 6.4 requires that the grammar have no A-productions. But note that 
the removal of unit-productions does not create A-productions (Exercise 16 at the end of this section), 
and the removal of useless productions does not create A-productions or unit-productions (Exercise 
17 at the end of this section). Therefore, we can remove all undesirable productions using the 
following sequence of steps: 


1. Remove A-productions. 
2. Remove unit-productions. 
3. Remove useless productions. 


The result will then have none of these productions, and the theorem is proved. m 


EXERCISES 


1. Complete the proof of Theorem 6.1 by showing that 


5 >aw 


implies 
S Sgu. 


2. In Example 6.1, show a derivation tree for the string ababbac, using both the original and the 
modified grammar. 
3. Show that the two grammars 
S — abA Bjba, 
A — aaa, 
B — aA|bb 
and 
S — abAaA |abAbb| ba, 
A — aaa 


are equivalent. 


4. In Theorem 6.1, why is it necessary to assume that A and B are different variables? 


5. Eliminate all useless productions from the grammar 


S — aS|AB, 
A — DA, 
B — AA. 


What language does this grammar generate? 


6. Eliminate useless productions from 


S — alaA| BIC, 
A — aB\A, 

B — Aa, 

C — CD, 

D — ddd. 


7. Eliminate all A-productions from 


S — AaBlaaB, 
A —e À 5 
B -2 bb A Jà ù 


8. Remove all unit-productions, all useless productions, and all -productions from the grammar 


S — aAlaBB, 
A — aaA|A, 
B — bB\bbC, 
C — B. 


What language does this grammar generate? 


9. Eliminate all unit-productions from the grammar in Exercise 6. 
10. Complete the proof of Theorem 6.3. 
11. Complete the proof of Theorem 6.4. 


12. Use the construction in Theorem 6.3 to remove -productions from the grammar in Example 5.4. 
What language does the resulting grammar generate? 


13. Consider the grammar G with productions 


S — AJB, 
A — À, 
B — aBb, 
B — b. 


Construct a grammar G by applying the algorithm in Theorem 6.3 to G. What is the difference 
between L(G) and L(“)? 


14. Suppose that G is a context-free grammar for which à e L (G). Show that if we apply the 


construction in Theorem 6.3, we obtain a new grammar G such that L( G) =I (G)- {Å}. 


15. Give an example of a situation in which the removal of A-productions introduces previously 
nonexistent unit-productions. 


16. Let G be a grammar without A-productions, but possibly with some unit-productions. Show that 
the construction of Theorem 6.4 does not then introduce any A-productions. 


17. Show that if a grammar has no A-productions and no unit-productions, then the removal of useless 
productions by the construction of Theorem 6.2 does not introduce any such productions. 


18. Justify the claim made in the proof of Theorem 6.1 that the variable B can be replaced as soon as 
it appears. 


19. Suppose that a context-free grammar G = (V, T, S, P) has a production of the form 
A— xy, 


where -Y € (V UT)”. Prove that if this rule is replaced by 


A — By, 
B— zx, 


where B ¢ V, then the resulting grammar is equivalent to the original one. 


20. Consider the procedure suggested in Theorem 6.2 for the removal of useless productions. 
Reverse the order of the two parts, first eliminating variables that cannot be reached from S, then 
removing those that do not yield a terminal string. Does the new procedure still work correctly? If 
so, prove it. If not, give a counterexample. 


21. It is possible to define the term simplification precisely by introducing the concept of 
complexity of a grammar. This can be done in many ways; one of them is through the length of all 
the strings giving the production rules. For example, we might use 


complexity (G) = a {1+ |v}. 


A-—veP 


Show that the removal of useless productions always reduces the complexity in this sense. What 
can you say about the removal of A-productions and unit-productions? 


22. A context-free grammar G is said to be minimal for a given language L if complexity (G) < 


complexity (@) for any © generating L. Show by example that the removal of useless productions 
does not necessarily produce a minimal grammar. 


“23. Prove the following result. Let G = (V. T, S, P ) be a context-free grammar. Divide the set of 
productions whose left sides are some given variable (say, A), into two disjoint subsets 


A — Arı |Ar2|---|Arn, 


A — yı ly2l sanes lYm 


where x,y; are in (V UT), butA is not a prefix of any y; Consider the grammar 
G = (Vu{zZ} ee, P) 


, where Z € V and P is obtained by replacing all productions that 
have A on the left by 
A — ywlyZ, t=1,2,...,m, 
2° tilti, = I, 2,...,2: 


Then Z (G) =L(@). 
24. Use the result of the preceding exercise to rewrite the grammar 


A — Aa|aBcl), 
B.— Bb\be 


so that it no longer has productions of the form A — Ax or B —> Bx. 


“25. Prove the following counterpart of Exercise 23. Let the set of productions involving the variable 
A on the left be divided into two disjoint subsets 


A > TA |T2A| -|En A 
and 
A > yı lyzl- Ym; 


where A is not a suffix of any y;. Show that the grammar obtained by replacing these productions 
with 


A= ¥|Zy%:, t= han 
i= Te Ls, I 0 


bho 


oy Tb, 


is equivalent to the original grammar. 


6.2 Two Important Normal Forms 


There are many kinds of normal forms we can establish for context-free grammars. Some of these, 
because of their wide usefulness, have been studied extensively. We consider two of them briefly. 


Chomsky Normal Form 


One kind of normal form we can look for is one in which the number of symbols on the right of a 
production is strictly limited. In particular, we can ask that the string on the right of a production 
consist of no more than two symbols. One instance of this is the Chomsky normal form. 


Definition 6.4 


A context-free grammar is in Chomsky normal form if all productions are of the form 


A— BC 


where A, B, C are in V, and a is in T. 


Example 6.7 


The grammar 


S — AS|a, 
A — SAl|b 
is in Chomsky normal form. The grammar 
S — AS|AAS, 
A — SAlaa 


is not; both productions S — AAS and A — aa violate the conditions of Definition 6.4. 


Theorem 6.6 


Any context-free grammar G = (V, T, S, P) with à ¢Z (G) has an equivalent grammar 
G-(V,7,s,P) 


Proof: Because of Theorem 6.5, we can assume without loss of generality that Œ has no /- 


in Chomsky normal form. 


productions and no unit-productions. The construction of G will be done in two steps. 
Step 1: Construct a grammar G; = (V,,T, S, P,) from G by considering all productions in P in the 
form 


A— T12- En, (6.5) 


where each x; is a symbol either in V or in T. Ifn = 1, then xı must be a terminal since we have no 
unit-productions. In this case, put the production into P}. If n > 2, introduce new variables Ba for each 
a € T. For each production of P in the form (6.5) we put into P} the production 


A —> CC.. .Cps 
where 
C;=x;ifx isin V, 
and 
C;= B, ifx; 5a. 
For every B, we also put into P, the production 


B 


gods 


This part of the algorithm removes all terminals from productions whose right side has length greater 
than one, replacing them with newly introduced variables. At the end of this step we have a grammar 
G; all of whose productions have the form 


A-— a, (6.6) 
or 
A — C102- Chn, (6.7) 
where C; € V}. 
It is an easy consequence of Theorem 6.1 that 
L(G) =L (0). 


Step 2: In the second step, we introduce additional variables to reduce the length of the right sides of 
the productions where necessary. First we put all productions of the form (6.6) as well as all the 


productions of the form (6.7) with n = 2 into P . For n > 2, we introduce new variables D,, D3,...and 


put into P the productions 
A = CiD}, 
Dı — Co Do, 


À A 4 i 
D E —> ¢ R S E 


Obviously, the resulting grammar G isin Chomsky normal form. Repeated applications of Theorem 
6.1 will show that L (G,)=L (©), so that 


L(®)=L(0). 


This somewhat informal argument can easily be made more precise. We will leave this to the 
reader.m 


Example 6.8 


Convert the grammar with productions 


S = ABg. 
A — aab, 


B — Ac 


to Chomsky normal form. 


As required by the construction of Theorem 6.6, the grammar does not have any -productions or 


any unit-productions. 


In Step 1, we introduce new variables B,, Bp, Bo and use the algorithm to get 


S — 


B; 4 


ABB,, 
BBB, 5 


In the second step, we introduce additional variables to get the first two productions into normal 


form and we get the final result 


S — ADı, 
Dı — BBa, 
A — BaD», 
Dy — BaBy, 
B — AB,, 
Ba > a, 
B, — b, 
Be > c. 


Greibach Normal Form 


Another useful grammatical form is the Greibach normal form. Here we put restrictions not on the 
length of the right sides of a production, but on the positions in which terminals and variables can 
appear. Arguments justifying Greibach normal form are a little complicated and not very transparent. 
Similarly, constructing a grammar in Greibach normal form equivalent to a given context-free 
grammar is tedious. We therefore deal with this matter very briefly. Nevertheless, Greibach normal 
form has many theoretical and practical consequences. 


Definition 6.5 


A context-free grammar is said to be in Greibach normal form if all productions have the form 
A— ax, 
where a € T and x € V” 
If we compare this with Definition 5.4, we see that the form 4 — ax is common to both Greibach 
normal form and s-grammars, but Greibach normal form does not carry the restriction that the pair (A, 


a) occur at most once. This additional freedom gives Greibach normal form a generality not 
possessed by s-grammars. 


If a grammar is not in Greibach normal form, we may be able to rewrite it in this form with some 
of the techniques encountered above. Here are two simple examples. 


Example 6.9 


The grammar 


S + AB 
A — aA|bB| b, 
B = b 


is not in Greibach normal form. However, using the substitution given by Theorem 6.1, we 
immediately get the equivalent grammar 


S — aAB |\bBB|bB, 
A — aA|bB| b, 
B — b, 


which is in Greibach normal form. 


Example 6.10 


Convert the grammar 
S — abSblaa 


into Greibach normal form. 


Here we can use a device similar to the one introduced in the construction of Chomsky normal 
form. We introduce new variables A and B that are essentially synonyms for a and b, respectively. 
Substituting for the terminals with their associated variables leads to the equivalent grammar 


S — aBSBiaA, 
A > 
B = p, 


which is in Greibach normal form. 


In general, though, neither the conversion of a given grammar to Greibach normal form nor the 
proof that this can always be done is a simple matter. We introduce Greibach normal form here 
because it will simplify the technical discussion of an important result in the next chapter. However, 
from a conceptual viewpoint, Greibach normal form plays no further role in our discussion, so we 
only quote the following general result without proof. 


Theorem 6.7 


For every context-free grammar G with à ¢ L (G), there exists an equivalent grammar G in Greibach 
normal form. 


EXERCISES 


1. Provide the details of the proof of Theorem 6.6. 


— aSblab 


2. Convert the grammar into Chomsky normal form. 


b 


3. Transform the grammar S — aSaAl|A,A — abA)b into Chomsky normal form. 


4. Transform the grammar with productions 


S — abAB. 


A — bDAB|A, 
B — BAa|A|A 
into Chomsky normal form. 
5, Convert the grammar 
S — AB|aB, 
A — aab|à, 
B — bbA 


into Chomsky normal form. 


6. Let G = (V, T, S, P) be any context-free grammar without any -productions or unit-productions. 
Let k be the maximum number of symbols on the right of any production in P. Show that there is an 


equivalent grammar in Chomsky normal form with no more than | k—1)|P|+ [1 production 
rules. 
7. Draw the dependency graph for the grammar in Exercise 4. 


8. A linear language is one for which there exists a linear grammar (for a definition, see Example 
3.14). Let L be any linear language not containing À. Show that there exists a grammar G = (V, T, 
S, P) all of whose productions have one of the forms 


A eB. 
A. > Ba, 
A— mi 


where a € T, A, B € V, such that L = L (G). 


9. Show that for every context-free grammar G = (V, T, S, P) there is an equivalent one in which all 
productions have the form 


A — aBC, 


or 


M 
—_ 


where E UU {A}; ABYC 


10. Convert the grammar 


11. 


12. 


13. 


14. 


15. 


S — aSb|bSala|b 
into Greibach normal form. 
Convert the following grammar into Greibach normal form. 
S — aSblab. 
Convert the grammar 
S — ab |aS| aa S 
into Greibach normal form. 


Convert the grammar 


S — ABbļa, 
A — aaA|B, 
B — bAb 


into Greibach normal form. 


Can every linear grammar be converted to a form in which all productions look like A — ax, 
where a € T and © V U {A}? 


A context-free grammar is said to be in two-standard form if all production rules satisfy the 
following pattern 

A — aBC, 

A — aB, 

A-— a, 


where 4, B, Ce VandaeT. 
Convert the grammar G = ({S, A, B, C}, {a, b}, S, P) with P given as 
S — aSA, 
A.— ABC. 
B — b, 
C — aBC 


into two-standard form. 


“16. Two-standard form is general; for any context-free grammar G with A€ L (G), there exists an 
equivalent grammar in two-standard form. Prove this. 


6.3 A Membership Algorithm for Context-Free Grammars* 

InSection 5.2, we claim, without any elaboration, that membership and parsing algorithms for 
AB 

context-free grammars exist that require oponas u | steps to parse a string w. We are now in 

a position to justify this claim. The algorithm we will describe here is called the CYK algorithm, 

after its originators J. Cocke, D. H. Younger, and T. Kasami. The algorithm works only if the 

grammar is in Chomsky normal form and succeeds by breaking one problem into a sequence of 


smaller ones in the following way. Assume that we have a grammar G = (V, T, S, P) in Chomsky 
normal form and a string 


w= ay dz.. -Ap 


We define substrings 


and subsets of V 


Clearly, w € L (G) ifand only if S € Vip. 

To compute V;,, observe that A € V; if and only if G contains a production A — a;. Therefore, V;; 
can be computed for all 1 <i < n by inspection of w and the productions of the grammar. To continue, 
notice that for j > i, A derives w; if and only if there is a production A — BC, with B > wik and 


C = Wk+1j for some k with i < k, k < j. In other words, 
Vi; = U {A : A — BC, with B € Viz,C E Vk41,5}. (6.8) 
ke {2,t+1,...,7—1} 


An inspection of the indices in (6.8) shows that it can be used to compute all the V;; if we proceed in 
the sequence 


1. Compute Vij, Voz»... Vpn; 
2. Compute Viz, Vaz.. V, 


n-— in’ 
3. Compute V13, Vou, V, 


n—2,n? 


and so on. 


Example 6.11 


Determine whether the string w = aabbb is in the language generated by the grammar 


S = AB, 
A — BBļa, 
B — ABJb. 


First note that w;, =a, so V4; is the set of all variables that immediately derive a, that is, V}; = 
{A}. Since W22 = a, we also have V5 = {A} and, similarly? 


Vit = {A}, Von = {A}, Vag = {B} , Va = {BY}, Ves = (BY. 
Now we use (6.8) to get 
Vio = {A : A — BC, B € Vii, C € Vag}. 


Since V; = {A} and Vn = {A}, the set consists of all variables that occur on the left side of a 
production whose right side is 4A. Since there are none, V12 is empty. Next, 


V23 = {A : A — BC, B € V»,C € V33}, 


so the required right side is AB, and we have V2; = {S, B}. A straightforward argument along these 
lines then gives 


Vig = O,Vo3 = {S, B},V34 = {A}, VWs = {A}, 
Vig = {S,B},Voa = {A}, Vas = {5, B}, 
Via = {A}, Vos = {S, B}, 
Vis = {S, By 
so that w € L (G). 


The CYK algorithm, as described here, determines membership for any language generated by a 
grammar in Chomsky normal form. With some additions to keep track of how the elements of V; are 


derived, itcan be converted into a parsing method. To see that the CYK membership algorithm 
requires O (n3) steps, notice that exactly n (n +1) /2 sets of V; have to be computed. Each involves 


the evaluation of at most n terms in (6.8), so the claimed result follows. 


EXERCISES 


1. Use the CYK algorithm to determine whether the strings aabb, aabba, and abbbb are in the 
language generated by the grammar in Example 6.11. 

2. Use the CYK algorithm to find a parsing of the string aab, using the grammar of Example 6.11. 

3. Use the approach employed in Exercise 2 to show how the CYK membership algorithm can be 
made into a parsing method. 

4. Use the CYK method to determine if the string w = aaabbbbab is in the language generated by the 
grammar S — aSb\|b. 


Chapter 7 


Pushdown 
Automata 


he description of context-free languages by means of context-free grammars is convenient, 
as illustrated by the use of BNF in programming language definition. The next question is 
T whether there is a class of automata that can be associated with context-free languages. As 
we have seen, finite automata cannot recognize all context-free languages. Intuitively, we 
understand that this is because finite automata have strictly finite memories, whereas the 
recognition of a context-free language may require storing an unbounded amount of information. For 
example, when scanning a string from the language L = {a”b” : n > 0}, we must not only check that all 
a’s precede the first b, we must also count the number ofa’s. Since n is unbounded, this counting 
cannot be done with a finite memory. We want a machine that can count without limit. But as we see 
from other examples, such as {ww®}, we need more than unlimited counting ability: We need the 
ability to store and match a sequence of symbols in reverse order. This suggests that we might try a 
stack as a storage mechanism, allowing unbounded storage that is restricted to operating like a stack. 
This gives us a class of machines called pushdown automata (pda). 


In this chapter, we explore the connection between pushdown automata and context-free 
languages. We first show that if we allow pushdown automata to act nondeterministically, we get a 
class of automata that accepts exactly the family of context-free languages. But we will also see that 
here there is no longer an equivalence between the deterministic and nondeterministic versions. The 
class of deterministic pushdown automata defines a new family of languages, the deterministic 
context-free languages, forming a proper subset of the context-free languages. Since this is an 
important family for the treatment of programming languages, we conclude the chapter with a brief 
introduction to the grammars associated with deterministic context-free languages. 


7.1 Nondeterministic Pushdown Automata 


A schematic representation of a pushdown automaton is given in Figure 7.1. Each move of the control 
unit reads a symbol from the input file, while at the same time changing the contents of the stack 
through the usual stack operations. Each move of the control unit is determined by the current input 
symbol as well as by the symbol currently on top of the stack. The result of the move is a new state of 
the control unit and a change in the top of the stack. 


Definition of a Pushdown Automaton 


Formalizing this intuitive notion gives us a precise definition of a pushdown automaton. 


Figure 7.1 


Input file 


HE 


y 
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Control unit 


Definition 7.1 


A nondeterministic pushdown accepter (npda) is defined by the septuple 
M = (Q. 2. Eð; d0, 2, F) 5 


where 
Q is a finite set of internal states of the control unit, 
x is the input alphabet, 
T is a finite set of symbols called the stack alphabet, 
ô: Q x (XU {A}) x T — set of finite subsets of Q x I’* is the transition function, 
do € Q is the initial state of the control unit, 


z €T is the stack start symbol, 
F © Ọ is the set of final states. 


The complicated formal appearance of the domain and range of 6 merits a closer examination. 
The arguments of 6 are the current state of the control unit, the current input symbol, and the current 
symbol on top of the stack. The result is a set of pairs (q, x), where q is the next state of the control 
unit and x is a string that is put on top of the stack in place of the single symbol there before. Note that 
the second argument of ô may be J, indicating that a move that does not consume an input symbol is 
possible. We will call such a move a /-transition. Note also that 6 is defined so that it needs a stack 
symbol; no move is possible if the stack is empty. Finally, the requirement that the elements of the 
range of 6 be a finite subset is necessary because Q x I* is an infinite set and therefore has infinite 
subsets. While an npda may have several choices for its moves, this choice must be restricted to a 
finite set of possibilities. 


Example 7.1 


Suppose the set of transition rules of an npda contains 
6(q,.a,b) = {(qo, ed) , (q3, À) b, 


If at any time the control unit is in state q4, the input symbol read is a, and the symbol on top of the 
stack is b, then one of two things can happen: (1) the control unit goes into state q and the string cd 
replaces b on top of the stack, or (2) the control unit goes into state q} with the symbol b removed 


from the top of the stack. In our notation we assume that the insertion ofa string into a stack is done 
symbol by symbol, starting at the right end of the string. 


Example 7.2 


Consider an npda with 


Q = {40; 41,492: 43}; 


D= ta, bo}; 

r = {0,1}, 

Sees E 

F = {qs}, 

with initial state gp) and 

ô (q0,a,0) = {(qi,10),(q3,A)}. 
6(qo,A,0) = {(qg,A)}, 
Ong) Wt ae h e {(q,11)}, 
6 (q1,6,1) = {(q2,A)}, 
ô (qo,6,1) = {(q2,A)}, 
6(q2,A,0) = {(93,A)} . 


What can we say about the action of this automaton? 


First, notice that transitions are not specified for all possible combinations of input and stack 
symbols. For instance, there is no entry given for ô (gp, b, 0). The interpretation of this is the same 


that we used for nondeterministic finite automata: An unspecified transition is to the null set and 
represents a dead configuration for the npda. 


The crucial transitions are 
ô (ara, D = {(qi,11)}. 
which adds a 1 to the stack when an a is read, and 


å (q2,b. | hy {(qo,A)} i 


which removes a 1 when a b is encountered. These two steps count the number of a’s and match that 
count against the number of b’s. The control unit is in state g, until the first b is encountered at which 


time it goes into state q). This assures that no b precedes the last a. After analyzing the remaining 
transitions, we see that the npda will end in the final state q3 if and only if the input string is in the 
language 


L={a"b" :n>0}U {a}. 
As an analogy with finite automata, we might say that the npda accepts the above language. Of course, 


before making such a claim, we must define what we mean by an npda accepting a language. 


We can also use transition graphs to represent npda's. In this representation we label the edges of 
the graph with three things: the current input symbol, the symbol at the top of the stack, and the string 
that replaces the top of the stack. 


Example 7.3 


The npda in Example 7.2 is represented by the transition graph in Figure 7.2. 
Figure 7.2 


a, 0, 10 a, it: 11 


— b, 1,4 
A, 0, A 


While transition graphs are convenient for describing npda's, they are less useful for making 
arguments. The fact that we have to keep track, not only of the internal states, but also of the stack 
contents, limits the usefulness of transition graphs for formal reasoning. Instead, we introduce a 
succinct notation for describing the successive configurations of an npda during the processing of a 
string. The relevant factors at any time are the current state of the control unit, the unread part of the 
input string, and the current contents of the stack. Together these completely determine all the possible 
ways in which the npda can proceed. The triplet 


(g,w,u), 


where q is the state of the control unit, w is the unread part of the input string, and u is the stack 
contents (with the leftmost symbol indicating the top of the stack), is called an instantaneous 
description of a pushdown automaton. A move from one instantaneous description to another will be 
denoted by the symbol |; thus 


(qi, aw, bx) (qo,w, yr) 
is possible if and only if 
(qo, y) Eo (qi, a, b) ‘ 
Moves involving an arbitrary number of steps will be denoted by F. The expression 
( qi, W1, T1 ) F { q2, W2, T2 ) 


indicates a possible configuration change over a number of steps.! 


On occasions where several automata are under consideration we will use Fm to emphasize that 
the move is made by the particular automaton M. 


The Language Accepted by a Pushdown Automaton 


Definition 7.2 


Let M = (Q, X, T’, 6, qo, z, F) be a nondeterministic pushdown automaton. The language accepted by 
M 1s the set 


L(M)= fu ED”: (ao, w,z) r à u), pE F, u€ g) ; 


In words, the language accepted by M is the set of all strings that can put M into a final state at the end 
of the string. The final stack content u is irrelevant to this definition of acceptance. 


Example 7.4 


Construct an npda for the language 


L = {w € {a,b}* : na (w) = ne (w)} 


As in Example 7.2, the solution to this problem involves counting the number of a’s and b’s, which is 
easily done with a stack. Here we need not even worry about the order of the a’s and b’s. We can 
insert a counter symbol, say 0, into the stack whenever an a is read, then pop one counter symbol from 
the stack when a b is found. The only difficulty with this is that if there is a prefix of w with more b’s 
than a’s, we will not find a 0 to use. But this is easy to fix; we can use a negative counter symbol, say 
1, for counting the b’s that are to be matched against a’s later. The complete solution is given in the 
transition graph in Figure 7.3. 


Figure 7.3 


a, 0, 00; b, 1, 11 
a, z, Oz; b, 0, A; 
b, z, 1z,a,1,A, 


In processing the string baab, the npda makes the moves 


(qgo.baab,z) F (qo, aab, 1z) F (go, ab, z) 
tatr (go,A.z) F (df; Asz] 


and hence the string is accepted. 
Example 7.5 


To construct an npda for accepting the language 
L={wwk: we fa, b}*} 


we use the fact that the symbols are retrieved from a stack in the reverse order of their insertion. 
When reading the first part of the string, we push consecutive symbols on the stack. For the second 
part, we compare the current input symbol with the top of the stack, continuing as long as the two 
match. Since symbols are retrieved from the stack in reverse of the order in which they were inserted, 
a complete match will be achieved if and only if the input is of the form ww*. 

An apparent difficulty with this suggestion is that we do not know the middle of the string, that 1s, 
where w ends and w? starts. But the nondeterministic nature of the automaton helps us with this; the 
npda correctly guesses where the middle is and switches states at that point. A solution to the 
problem is given by M = (Q, 2, T, ò, qo, z, F), where 


Q= {qo q1; Q2}; 
X= {a,b}, 
T= {a, b, z}, 
F= {q2}. 


The transition function can be visualized as having several parts: a set to push w on the stack, 


ô (qo,a,a) = {(qo.aa)}, 
6(qo,6,a) = {(qo, ba)} , 
6(qo,a,b) = {(qo, ab)}, 
ô (qo,b, b) = {(qo, bb)} , 
0(qo,a,2) = {(qo. az)}, 
ô (qo.b,z) = {(q0,6z)}, 


a set to guess the middle of the string, where the npda switches from state qọ to q4 


å (qo, A, a) — {(q1,a )} ; 
ò (qo, A. b) = {(q1,5)}, 


a set to match w? against the contents of the stack, 


å (gj; a; â) = f(q,à)}, 
fa) (qi, 0, 5) = {(q1,A)} ‘ 


and finally 
6(qi,A,z) = {(q2, z )} 3 


to recognize a successful match. 
The sequence of moves in accepting abba is 


(qo, abba, z) F (qo, bba.az) F (qo, ba, baz) 
- (qi, ba, baz) F (qy,a,az)F (grn Azz) F (qo. 2). 


The nondeterministic alternative for locating the middle of the string is taken at the third move. At that 
stage, the pda has the instantaneous descriptions (go, ba, baz) and has two choices for its next move. 


One is to use ò (qo, b, b) = {(qo, bb)} and make the move 
( qo; ba, baz ) + ( go. a, bbaz F's 


the second is the one used above, namely 6 (qo, à ,b) = {(q), 5)}. Only the latter leads to acceptance 
of the input. 


EXERCISES 


1. Find a pda with fewer than four states that accepts the same language as the pda in Example 7.2. 
2. Prove that the pda in Example 7.5 does not accept any string not in {ww*}. 
3. Construct npda's that accept the following regular languages. 


(a) L| = L (aaa*b). 
(b) L| = L (aab*aba*). 
(c) the union of Z, and L». 
(d) Lı = Ly. 
4. Construct npda's that accept the following languages on È = {a, b, c}. 
(a) L= {a"b”" : n> 0}. 
(b) L= {wew® : we fa, b}*}. 
(c) L = {a"b"c""™ :n>0, m> 0}. 
(d) L= {ab "e": n> 0, m21}. 
(e) L = {@b"c" : n> 0}. 
(f) L= {a"b”:n<m< 3n}. 
(g) L= {win (w) =m (w) + I}. 
(h) L = {w : na (w) = 2ny (w)}. 
(1) L= iw : na (w) + n, (w) =n, (w)5. 
G) L= {w : 2n; (w) < n, (w) < 3n, (w)}. 
(k) L= iw : na (w) < m Ww)5. 
5. Construct an npda that accepts the language L = {a”b” : n> 0, n 4m}. 
6. Find an npda on È = {a, b, c} that accepts the language 


L= { wr wə : Wy, W2 € fa, b}* „wi £ wy} , 
7. Find an npda for the concatenation of L (a*) and the language in Exercise 6. 


8. Find an npda for the language L = {ab (ab)" b (ba)" : n= 0}. 
9. Is it possible to find a dfa that accepts the same language as the pda 


M = ({q0, q1}. {a,b} ,{z},6,q0,z,{qi}), 
with 
ô (qo,a,z) = {(q1,z)}, 
ò (qo. bz) = {( go, =) } : 


i 
Olara) = {(q1,z)}, 
i 


{ qo, =) } ? 


af qi. b. A 


10. What language is accepted by the pda 


M = ({q0.q1, 92. 93.94.95}. {a,b}, {0,1,a,z},4,z, qo, {gs}). 
with 


{(qi,1lz)}, 

{(q1, 11) 

ô (q2,a, 1) = {(q3,A)}, 
A)} 


ò ( qo, b, z) 


ò (qi, b, 1) 


ò (q3, a, 1) = {(qa,- 


6(qa,a,z) = {(qa,z).(qs,z 


11. What language is accepted by the npda M = ({q, 41, q2}, {@ b}, ta b, z}, ©, do, Z, {go} ) with 


transitions 
6(qo,a,z) = {(q1,a),(g2,A)}. 
À (q1.5, a) = {(qi,b)}, 
Ò (qi, b,b) oa {(qi, 5)}, 
Ò (q1,a,b) = {(q2,A yee 


12. What language is accepted by the npda in Example 7.4 if we use F= (qo, qf }? 
13. What language is accepted by the npda in Exercise 11 above if we use F = {qo, q1, qo}? 


14. Find an npda with no more than two internal states that accepts the language L (aa*ba*). 


15. Suppose that in Example 7.2 we replace the given value of 6 (q2, A, 0) with 
ô (q2,A,0) = {(qo,A)}- 
What is the language accepted by this new pda? 


16. We can define a restricted npda as one that can increase the length of the stack by at most one 
symbol in each move, changing Definition 7.1 so that 


5:Qx (DU {A}) x T — 29% ETAn, 
The interpretation of this is that the range of ò consists of sets of pairs of the form (q;, ab), (q; 
a), or (q; À). Show that for every npda M there exists such a restricted npda M. such that L (M) 
M. 
=[L(). 

17. An alternative to Definition 7.2 for language acceptance is to require the stack to be empty when 
the end of the input string is reached. Formally, an npda M is said to accept the language N (M) by 
empty stack if 

N(M)= i" ED: (go. w, z) Farin, A, y} : 


where p is any element in Q. Show that this notion is effectively equivalent to Definition 7.2, in 


the sense that for any npda M there exists an npda M. such that Z (M) = N( ay and vice versa. 


7.2 Pushdown Automata and Context-Free Languages 


In the examples of the previous section, we saw that pushdown automata exist for some of the familiar 
context-free languages. This is no accident. There is a general relation between context-free 
languages and nondeterministic pushdown accepters that is established in the next two major results. 
We will show that for every context-free language there is an npda that accepts it, and conversely, 
that the language accepted by any npda is context-free. 


Pushdown Automata for Context-Free Languages 


We first show that for every context-free language there is an npda that accepts it. The underlying idea 
is to construct an npda that can, in some way, carry out a leftmost derivation of any string in the 
language. To simplify the argument a little, we assume that the language is generated by a grammar in 
Greibach normal form. 

The pda we are about to construct will represent the derivation by keeping the variables in the 
right part of the sentential form on its stack, while the left part, consisting entirely of terminals, is 
identical with the input read. We begin by putting the start symbol on the stack. After that, to simulate 
the application of a production A — ax, we must have the variable A on top of the stack and the 
terminal a as the input symbol. The variable on the stack is removed and replaced by the variable 
string x. What 6 should be to achieve this is easy to see. Before we present the general argument, let 
us look at a simple example. 


Example 7.6 


Construct a pda that accepts the language generated by a grammar with productions 
S — aSbbla. 


We first transform the grammar into Greibach normal form, changing the productions to 


5 + aS Ala, 
A — bB. 
B —= b. 


The corresponding automaton will have three states {, q1, q2}, with initial state gp and final state q3. 
First, the start symbol S is put on the stack by 


Ò Í qo, A, 2 Ves { ( qi, Sz )} ; 


The production S — aSA will be simulated in the pda by removing S from the stack and replacing it 
with SA, while reading a from the input. Similarly, the rule S — a should cause the pda to read ana 


while simply removing S. Thus, the two productions are represented in the pda by 
d6(q,,a,S)= { (qu, SA). (q1.A) } $ 
In an analogous manner, the other productions give 


6(q,b,A) = {(qi,B)}., 
ò (q.b, B) = {(q1.A)}. 


The appearance of the stack start symbol on top of the stack signals the completion of the derivation 
and the pda is put into its final state by 


ô (q1; À, z2) = { (q2, À)} 


The construction of this example can be adapted to other cases, leading toa general result. 


Theorem 7.1 


For any context-free language L, there exists an npda M such that 


L=L(M). 


Proof: If L is a à-free context-free language, there exists a context-free grammar in Greibach normal 
form for it. Let G = (V, T, S, P ) be such a grammar. We then construct an npda that simulates leftmost 
derivations in this grammar. As suggested, the simulation will be done so that the unprocessed part of 
the sentential form is in the stack, while the terminal prefix of any sentential form matches the 
corresponding prefix of the input string. 


Specifically, the npda will be 
M= ({ qo. qias} RR V Ute} Ops {af EJ; 


where z ¢ V. Note that the input alphabet of M is identical with the set of terminals of G and that the 
stack alphabet contains the set of variables of the grammar. 


The transition function will include 
5 (q0, A, 2) = {(g1,S2)}, (7.1) 


so that after the first move of M, the stack contains the start symbol S of the derivation. (The stack 
start symbol z is a marker to allow us to detect the end of the derivation.) In addition, the set of 
transition rules is such that 


(quu) E ô (q,a, A), (7:2 
whenever 


A —> au 


is in P. This reads input a and removes the variable A from the stack, replacing it with u. In this way 
it generates the transitions that allow the pda tosimulate all derivations. Finally, we have 


6(q1.A,z) = {irz fo (7.3) 


to get M into a final state. 
To show that M accepts any w € L (G), consider the partial leftmost derivation 


ss a109 -an AA -Am 
=> jt" °* a,bB, zari Bg Ao iaia Am: 


IfM is to simulate this derivation, then after reading a,a5...a,, the stack must contain A,A)...A,,. To 


take the next step in the derivation, G must have a production 
A, — bB,...B,. 
But the construction is such that then M has a transition rule in which 
(qı, By...By) € Ò (qı, b, Aj), 


so that the stack now contains B,...B,A,...A,, after having read ajaz...a„b. 


A simple induction argument on the number of steps in the derivation then shows that if 


then 
(qı, w, Sz) K (q1,A,z). 
Using (7.1) and (7.3) we have 
(qo, w, z) F (qu, w, Sz) w (dis À;2) F (a7,A,2), 


so that L (G) S L (M). 
To prove that L (M) E L (G), let w € L (M). Then by definition 


(qo, w,2) F (qf, à, u). 


But there is only one way to get from qọ to qı and only one way from q; to gy. Therefore, we must 
have 


(qi, w, Sz) F (q1,A,2z). 


Now let us write w = a;a2a3...a„. Then the first step in 


(q1.@,d9a3°--a,. Sz) k (q1, A, z) (7.4) 
must be a rule of the form (7.2) to get 
(q1, @14903 -+-Gn,S2) F (q1, a203 °+- An. U12). 
But then the grammar has a rule of the form S — au, so that 
S => ajui. 
Repeating this, writing u; = Au, we have 
(q1, a203 `- ` an, Au22) F (q1, a3 -` - Qn, U3U22), 
implying that A — a, u; is in the grammar and that 
ss a,agUgug. 


This makes it quite clear at any point the stack contents (excluding z) are identical with the unmatched 
part of the sentential form, so that (7.4) implies 


~ m 
D > 0102'ün- 


In consequence, L (M) E L(G), completing the proof if the language does not contain A. 
Ifà € L, we add to the constructed npda the transition 


Ò ( do. À, z ) = { ( qf: ri )} 
so that the empty string is also accepted. m 


l Because of the nondeterminism, such a change is of course not necessary. 
Example 7.7 


Consider the grammar 


S — aA, 

A — aABC |bB\a, 
B= b, 

Ce. 


Since the grammar is already in Greibach normal form, we can use the construction in the previous 
theorem immediately. In addition to rules 


6 (qo, A, z) = {(qi, Sz)} 


and 
Ò (qi, A= { ( qf. 2 )} $ 


the pda will also have transition rules 


{(q1, A)}. 
6(qi1,a,A) = {(q1, ABC) ,(q,A)}, 
6(qi,6,A) = {(qi, B)}. 
6(qi,6,B) = {(q1,A)}. 
6(qi.ce,C) = {(q1.A)}. 


II 


ô (q.a, S) 


The sequence of moves made by M in processing aaabc is 
(qo, aaabe, z) (qı, aaabe, Sz) 
(qı, aabe, Az) 
(q, abe, ABC z) 
(qi, be, BCz) 
(q1,¢,Cz) 


(q1,A, z) 


My hk li a SIR ae 


(qf; ASS). 


This corresponds to the derivation 


S => aA => aaABC = aaaBC = aaabC = aaabc. 


In order to simplify the arguments, the proof in Theorem 7.1 assumed that the grammar was in 
Greibach normal form. It is not necessary to do this; we can make a similar and only slightly more 
complicated construction from a general context-free grammar. For example, for productions of the 
form we remove A from the stack and replace it with Bx, but consume no input symbol. For 
productions of the form 


A— Bx, 
A— abCx, 


we must first match the ab in the input against a similar string in the stack and then replace 4 with C,. 
We leave the details of the construction and the associated proof as an exercise. 


Context-Free Grammars for Pushdown Automata 


The converse of Theorem 7.1 is also true. The construction involved readily suggests itself: Reverse 
the process in Theorem 7.1 so that the grammar simulates the moves of the pda. This means that the 
content of the stack should be reflected in the variable part of the sentential form, while the processed 
input is the terminal prefix of the sentential form. Quite a few details are needed to make this work. 


To keep the discussion as simple as possible, we will assume that the npda in question meets the 
following requirements: 


1. It has a single final state gf that is entered if and only if the stack is empty; 


2. Witha € È U {A}, all transitions must have the form 6(q;, a, A) = {c1, Cy,..., Cy}, where 


or 
= (d BC). (7.6) 


That is, each move either increases or decreases the stack content by a single symbol. 


These restrictions may appear to be very severe, but they are not. It can be shown that for any 
npda there exists an equivalent one having properties 1 and 2. This equivalence is explored partially 
in Exercises 16 and 17 in Section 7.1. Here we need to explore it further, but again we will leave the 
arguments as an exercise (see Exercise 16 at the end of this section). Taking this as given, we now 
construct a context-free grammar for the language accepted by the npda. 


As stated, we want the sentential form to represent the content of the stack. But the configuration 
of the npda also involves an internal state, and this has to be remembered in the sentential form as 
well. It is hard to see how this can be done, and the construction we give here is a little tricky. 


Suppose for the moment that we can find a grammar whose variables are of the form (q;4q;) and 
whose productions are such that 


( gi Aq; ) = V. 


if and only if the npda erases A from the stack while reading v and going from state q; to state qj. 


“Erasing” here means that A and its effects (1.e., all the successive strings by which it is replaced) are 
removed from the stack, bringing the symbol originally below A to the top. If we can find such a 
grammar, and if we choose (qzq,) as its start symbol, then 


(qozqr) > w 


if and only if the npda removes z (creating an empty stack) while reading w and going from qo to gy 
But this is exactly how the npda accepts w. Therefore, the language generated by the grammar will be 
identical to the language accepted by the npda. 

To construct a grammar that satisfies these conditions, we examine the different types of 
transitions that can be made by the npda. Since (7.5) involves an immediate erasure of A, the grammar 
will have a corresponding production 


( qi Aqj )— a. 


Productions of type (7.6) generate the set of rules 
(qiAqk) — a(q;Bar) (Can) . 


where q, and q; take on all possible values in Q. This is due to the fact that to erase A we first replace 
it with BC, while reading an a and going from state qi to qj. Subsequently, we gofrom q; to q;, erasing 
B, then from q; to qq, erasing C. 

In the last step, it may seem that we have added too much, as there may be some states q; that 
cannot be reached from q, while erasing B. This is true, but this does not affect the grammar. The 
resulting variables (q,Bq;) are useless variables and do not affect the language accepted by the 
grammar. 

Finally, as a start variable we take (qozq,), where q, is the single final state of the npda. 


Example 7.8 


Consider the npda with transitions 


d(qo,a,z) = {(qo, Az)} 
ô (qoa, A) = {(qo,A)}., 
ô (qo, b, A) = 4(qi,A)} 
ô (q1, À; z) = 4 (q2;,À)) 


Using qo as the initial state and q, as the final state, the npda satisfies condition 1 above, but not 2. To 
satisfy the latter, we introduce a new state q3 and an intermediate step in which we first remove the A 
from the stack, then replace it in the next move. The new set of transition rules is 


0(qo,a,z) = {(qo, Az)} 
d(qo,a,A) = {(qo,A)} 
ô (qo, b, A) = {(41, À)} 
ô (q1, à, z) = {(q2,A)} 


The last three transitions are of the form (7.5) so that they yield the corresponding productions 
(qoåq3)—a, (qoAqı)— b, (qizq2)— à. 


From the first two transitions we get the set of productions 


(qozqo) — a (qoAqo) (40240) |a (Go Aq) (91 240)| 
a (qoAq2) (q22qG0) |a (go Ags) (g32zq0) 
(qozq1) — a (q0oAqo) (qo241) |a (Go Aq) (41 241)| 
a (qoAq2) (42241) |a (Go-Aq3) (43241) . 
(qozq2) — a (qoAqo) (qoz42) |a (qoAq1) (q1z42)| 
a (qoAq2) (42242) |a (q0Aq3) (43242) . 
(qozq3) — a (qoAqo) (qoz43) |a (q0 Aq1) (q1243)| 
a (qoAq2) (q2zq3) |a (qo Ags) (932493) . 


(q324q i — (goAqo) (90240) |(q0Aq1) (91240)| (GoA42) (42240) | (GoAq3) (93240) , 
(q3zq1) — (goAqo) (q02q1) |(qoAq1) (412491)| (q0Aq2) (92241) | (q0Aq3) (932491), 
(43: -= — (qoAqo) (40242) |(GoAq1) (q1242)| (q0Aq2) (42242) | (GoAq3) (93242), 
(93293) — (q0oAqo) (90743) |(q0Aq1) (412493)| (q0Aq2) (42243) | (40443) (43243) - 


This looks quite complicated, but can be simplified. A variable that does not occur on the left 
side of any production must be useless, so we can immediately eliminate (qọ4qọ) and (qọ4q2). Also, 


by looking at the transition graph of the modified npda, we see that there is no path from g to qo, from 
qı to qı, from q; to q3, and from q, to g>, which makes the associated variables also useless. When 
we eliminate all these useless productions, we are left with the much shorter grammar 


(qgAq3) — a, 
(qoAqi) — b, 
(q1zq@2) — A, 
(qozq0) — a(qoAq3)(q32q0), 
(gozq1) — a(qoAq3)(q3zq1), 
(qoz42) —> a(qoAqi)(412¢2)|a(G0Aq3) (93242), 
(gozq3) — a(qoAqs)(q32q3), 

(932G0) — (qoAq3)(93740), 

(93291) — (qoAq3) (93241). 

(q3zq2) —> (qoAqi)(q1242)|(GoAq3) (493242), 
(93293) — (qoAqs)(937493), 


with start variable (g¢9zq>). 


Example 7.9 


Consider the stringw =aab. This is accepted by the pda inExample 7.8, with successive 
configurations 


(qo, aab.z) H (qo. ab, Az) 
F (qg.b, z) 

F (qo, b, Az) 
EF (q1.A, z) 

EL (q2,A, A). 


The corresponding derivation with G is 


(qozq2) > a(qoAqs3) (432492) 


=> aa (q3zqə2) 


lJ 


> aa (qoAqi ) { q12q2 ) 


I} 


> aab (qy2q2) 


=> aab. 


The steps in the proof of the following theorem will be easier to understand if you notice the 
correspondence between the successive instantaneous descriptions of the pda and the sentential forms 
in the derivation. The first q; in the leftmost variable of every sentential form is the current state of the 


pda, while the sequence of middle symbols is the same as the stack content. 


Although the construction yields a rather complicated grammar, it can be applied to any pda whose 
transition rules satisfy the given conditions. This forms the basis for the proof of the general result. 


Theorem 7.2 


If L = L (M) for some npda M, then L is a context-free language. 
Proof: Assume that M = (Q, È}, I’, 5, qo, z, {g¢}) satisfies conditions 1 and 2 above. We use the 


suggested construction to get the grammar G = (V, T, S, P), with T= È and V consisting of elements of 
the form (q;cq;). We will show that the grammar so obtained is such that for all g;,q;,¢ OQ, Ae T,X ¢€ 


P*a Vez", 
- k r 
(qi uv, AX) F (qj; v, X) (7.7) 
implies that 
(qi Aq;) Š u, 


and vice versa. 


The first part is to show that, whenever the npda is such that the symbol A and its effects can be 
removed from the stack while reading u and going from state q; to q;, then the variable (q;4q;) can 


derive u. This is not hard to see since the grammar was explicitly constructed to do this. We only 
need an induction on the number of moves to make this precise. 


For the converse, consider a single step in the derivation such as 


(qiAqk) > a (qj Bq) (Cay) - 
Using the corresponding transition for the npda 
6(qi,a, A) = {(q;,BC),.:.}, (7.8) 


we see that the A can be removed from the stack, BC put on, reading a, with the control unit going 
from state q; to q;. Similarly, if 


(qAq;) > a, (7.9) 
then there must be a corresponding transition 
d(g;,a,A)= {(q;.A)} (7.10) 


whereby the A can be popped off the stack. We see from this that the sentential forms derived from 
(q;4q;) define a sequence of possible configurations of the npda by which (7.7) can be achieved. 


Note that (q;4q;) > a(q;Bq)) (q;C9;,) might be possible for some (q;Bq;) (¢;Cq,) for which there is 
no corresponding transition of the form (7.8) or (7.10). But, in that case, at least one of the variables 


on the right will be useless. For all sentential forms leading to a terminal string, the argument given 
holds. 


If we now apply the conclusion to 
(go, w,z) F (qf, AAN 
we see that this can be so if and only if 
(qozar) Š w. 


Consequently L(M) = L(G).m 


EXERCISES 


1. Show that the pda constructed in Example 7.6 accepts the string aaabbbb that is in the language 
generated by the given grammar. 


2. Prove that the pda in Example 7.6 accepts the language L = {a"™™!b™ : n> 0}. 
3.Construct an npda that accepts the language generated by the grammar 

S -7 aSbb|aab. 
4. Construct an npda that accepts the language generated by the grammar S — aSSSlļab. 


5. Construct an npda corresponding to the grammar 


S — aABBlaAA, 
A- aBBla, 
B —= bB B|A. 


6. Construct an npda that will accept the language generated by the grammar G = ({S, A}, {a, b},S,P), 
with productions S — AA |a, A > SA| b. 


7. Show that Theorems 7.1 and 7.2 imply the following. For every npda M, there exists an npda aa 


with at most three states, such that L (M) = L a. 


8. Show how the number of states of M. in the above exercise can be reduced to two. 

9. Find an npda with two states for the language L = {a"b"*! : n> 0}. 

10. Find an npda with two states that accepts L = {a”b?” : n >1}. 

11. Show that the npda in Example 7.8 accepts L (aa*b). 

12. Show that the grammar in Example 7.8 generates the language L (aa*b). 

13.In Example 7.8, show that the variable (qọzq;) is useless. 

14. Use the construction in Theorem 7.1 to find an npda for the language in Example 7.5, Section 7.1. 


15. Find a context-free grammar that generates the language accepted by the npda M = ({qo,q;}, {a, 
b}, {A, Zz}, do, Z, {q1}), with transitions 


3(0, 4.2) = {(4o, 4z)}, 
(dob, A) = (q0, 44)}, 
ô(qo, a, A) = (qi)As 
16. Show that for every npda there exists an equivalent one satisfying conditions 1 and 2 in the 
preamble to Theorem 7.2. 


17. Give full details of the proof of Theorem 7.2. 


18. Give a construction by which an arbitrary context-free grammar can be used in the proof of 
Theorem 7.1. 


19. Does the grammar in Example 7.8 still have any useless variables? 


7.3 Deterministic Pushdown Automata and Deterministic Context- 
Free Languages 


A deterministic pushdown accepter (dpda) is a pushdown automaton that never has a choice in its 
move. This can be achieved by a modification of Definition 7.1. 


Definition 7.3 


A pushdown automaton M = (Q, Ł, T, ò, go, Z, F) is said to be deterministic if it is an automaton as 
defined in Definition 7.1, subject to the restrictions that, for every q € Q, a X U{A} and b ET, 


1. (q, a, b) contains at most one element, 


2.1f ò (q, A, b) is not empty, then 6 (q, c, b) must be empty for every c € È. 


The first of these conditions simply requires that for any given input symbol and any stack top, at most 
one move can be made. The second condition is that when a A-move is possible for some 
configuration, no input-consuming alternative is available. 


It is interesting to note the difference between this definition and the corresponding definition of a 
deterministic finite automaton. The domain of the transition function is still as in Definition 7.1 rather 
than Q x X x I because we want to retain A-transitions. Since the top of the stack plays a role in 
determining the next move, the presence of A-transitions does not automatically imply 
nondeterminism. Also, some transitions of a dpda may be to the empty set, that is, undefined, so there 
may be dead configurations. This does not affect the definition; the only criterion for determinism is 
that at all times at most one possible move exists. 


Definition 7.4 


A language L is said to be a deterministic context-free language if and only if there exists a dpda M 
such that L = L (M). 


Example 7.10 


The language 
L= {a,b,:n=0} 
is a deterministic context-free language. The pda M =({qo, 41, do}. {a,b}, {0,1}, 5, qdo, O, {go} ) with 


d(qo,a,0) = {(qi.10)}, 
d(q1,a,1) = {(q1,11)}, 
ô (q1,b,1) = {(q2.A)}. 
ô (g2,b,1) = {(q2.A)}., 

{ 


ô (q2, A,0) = (do.A)} 


accepts the given language. It satisfies the conditions of Definition 7.3 and is therefore deterministic. 


Look now at Example 7.5. The npda there is not deterministic because 
ô (qo,a,a) = { (q0, aa) } 
and 
ô (q0, à a) = {(q1,a)} 


and 


violate condition 2 of Definition 7.3. This, of course, does not imply that the language {ww*} itself is 
nondeterministic, since there is the possibility of an equivalent dpda. But it is known that the language 
is indeed not deterministic. From this and the next example we see that, in contrast to finite automata, 
deterministic and nondeterministic pushdown automata are not equivalent. There are context-free 
languages that are not deterministic. 


Example 7.11 


Let 

L, = {a"b" :n=0} 
and 

Ly = {a"b*® -n= 0}. 


An obvious modification of the argument that L} is a context-free language shows that L, is also 
context-free. The language 


L=1 07, 


is context-free as well. This will follow from a general theorem to be presented in the next chapter, 
but can easily be made plausible at this point. Let G) = (Yj, T, S4, P1) and Gy = (V5, T, S2, P2) be 
context-free grammars such that L,; = L (G,) and L, = L (G,). If we assume that V4 and V, are disjoint 
and that S = V, U V>, then, combining the two, grammar G = (V, U V2 U {S}, T, S, P), where 


P= P, U P, U {S> S1623, 


generates L, UL,. This should be fairly clear at this point, but the details of the argument will be 


deferred until Chapter 8. Accepting this, we see that L is context-free. But Z is not a deterministic 
context-free language. This seems reasonable, since the pda has either to match one b or two against 
eacha, and so has to make an initial choice whether the input is inZ, or inZ, . There is no 
information available at the beginning of the string by which the choice can be made 
deterministically. Of course, this sort of argument is based on a particular algorithm we have in mind; 
it may lead us to the correct conjecture, but does not prove anything. There is always the possibility 


of a completely different approach that avoids an initial choice. But it turns out that there is not, and L 
is indeed nondeterministic. To see this we first establish the following claim. If Z were a 
deterministic context-free language, then 


~ 


L=LU {a"b"c" :n=0} 


would be a context-free language. We show the latter by constructing an npda M for E, given a dpda 
M for L. 


The idea behind the construction is to add to the control unit ofM a similar part in which 
transitions caused by the input symbol b are replaced with similar ones for input c. This new part of 


the control unit may be entered after M has read a”b”. Since the second part responds to cn in the 


same way as the first part does to b”, the process that recognizes a”b?” now also accepts a”b”e". 
Figure 7.4 describes the construction graphically; a formal argument follows. 


Let M=(Q, 2, T, ò, do, Z, F) with 


Q= {qo, qis.. Gy}. 


Figure 7.4 


C) O O Addition 
A 


Control unit of M 


Î 
Then consider 


O = Q U {Go, ne noes Gn } ` 
F— Pi {a,:q€F}, 


and 6 constructed from 6 by including 
s (af, Aaja {f S )} , 


for all ge, Fs eT, and 


ô (G,c.s) = {(G.u)}. 
for all 


Ò (qi, b.s) = { (qj; u )} ‘ 
q; € Q, s €I, u € I[*. For M to accept a”b” we must have 


+ 
í nn \ j \ 
(qdo,a"b”, z) Fag (qi, À u). 


with q; € F. Because M is deterministic, it must also be true that 


(qo. a™h2" z ) Eng (qu bu), 


so that for it to accept a”b?” we must further have 
(qi. b” u) FM ( dj; A, uy), 


for some q; € F. But then, by construction 


* 
( di. ce u) H Mi ( dj; A, ui), 


so that! will accept a”b”c”. It remains to be shown that no strings other than those in L are 


accepted by Me this is considered in several exercises at the end of this section. The conclusion is 


that L = L Om), so that ? is context-free. But we will show in the next chapter (Example 8.1) that L 
is not context-free. Therefore, our assumption that Z is a deterministic context-free language must be 
false. 


EXERCISES 


1. Show that L = {a"b?" : n > 0} is a deterministic context-free language. 
2. Show that L = {a"b” : m > n +2} is deterministic. 

3. Is the language L = {a"b" : n> 1} U {b} deterministic? 

4. Is the language L = {a”b” : n> 1} U {a} deterministic? 


5. Show that the pushdown automaton in Example 7.4 is not deterministic, but that the language in the 
example is nevertheless deterministic. 


6. For the language L in Exercise 1, show that L* is a deterministic context-free language. 


7. Give reasons why one might conjecture that the following language is not deterministic. 


L={a"b"ck :n=morm=k}. 


8. Is the language L = {a"b” : n =m orn=m-+ 2} deterministic? 
9. Is the language {wew® : w efa, b}*} deterministic? 


10. While the language in Exercise 9 is deterministic, the closely related language L = {ww® : w €{a, 
b}*} is known to be nondeterministic. Give arguments that make this statement plausible. 


11. Show that L= {we {a, b}*: na (w) £ np (w)} is a deterministic context-free language. 


12. Show that ? in Example 7.11 does not accept a"b"c* for k £n. 


13. Show that “in Example 7.11 does not accept any string not in L (a*b*c*). 


14. Show that % in Example 7.11 does not accept a"b?"c* with k > 0. Show also that it does not 
accept a"b"c* unless m =n or m= 2n. 


15. Show that every regular language is a deterministic context-free language. 


16. Show that if L; is deterministic context-free and L, is regular, then the language L; UL, is 
deterministic context-free. 


17. Show that under the conditions of Exercise 16, L; N L, is a deterministic context-free language. 


18. Give an example of a deterministic context-free language whose reverse is not deterministic. 


7.4 Grammars for Deterministic Context-Free Languages* 


The importance of deterministic context-free languages lies in the fact that they can be parsed 
efficiently. We can see this intuitively by viewing the pushdown automaton as a parsing device. Since 
there is no backtracking involved, we can easily write a computer program for it, and we may expect 
that it will work efficiently. Since there may be (-transitions involved, we cannot immediately claim 
that this will yield a linear-time parser, but it puts us on the right track nevertheless. To pursue this, 
let us see what grammars might be suitable for the description of deterministic context-free languages. 
Here we enter a topic important in the study of compilers, but somewhat peripheral to our interests. 
We will provide only a brief introduction to some important results, referring the reader to books on 
compilers for a more thorough treatment. 


Suppose we are parsing top-down, attempting to find the leftmost derivation of a particular 
sentence. For the sake of discussion, we use the approach illustrated in Figure 7.5. We scan the input 
w from left to right, while developing a sentential form whose terminal prefix matches the prefix of œ 
up to the currently scanned symbol. To proceed in matching consecutive symbols, we would like to 
know exactly which production rule is to be applied at each step. This would avoid backtracking and 
give us an efficient parser. The question then is whether there are grammars that allow us to do this. 
For a general context-free grammar, this is not the case, but if the form of the grammar is restricted, 
we can achieve our goal. 


As first case, take the s-grammars introduced in Definition 5.4. From the discussion there, it is 
clear that at every stage in the parsing we know exactly which production has to be applied. Suppose 
that w =w ,w, and that we have developed the sentential formw,A,. Toget the next symbol of the 


sentential form matched against the next symbol in w, we simply look at the leftmost symbol of w», say 


a. If there is no rule A — ay in the grammar, the string w does not belong to the language. If there is 
such a rule, the parsing can proceed. But in this case there is only one such rule, so there is no choice 
to be made. 


Although s-grammars are useful, they are too restrictive to capture all aspects of the syntax of 
programming languages. We need to generalize the idea so that it becomes more powerful without 
losing its essential property for parsing. One type of grammar is called an LZ grammar. In an LL 
grammar we still have the property that we can, by looking at a limited part of the input (consisting 
of the scanned symbol plus a finite number of symbols following it), predict exactly which production 
rule must be used. The term ZZ is standard usage in books on compilers; the first L stands for the fact 
that the input is scanned from left to right; the second L indicates that leftmost derivations are 
constructed. Every s-grammar is an LL grammar, but the concept is more general. 


Figure 7.5 
a4 a, a a, Input w 
ay a Sentential form 
Matched part Yet to be matched 
Example 7.12 
The grammar 
oS aSb|ab 


is not an s-grammar, but it is an LL grammar. In order to determine which production is to be applied, 
we look at two consecutive symbols of the input string. If the first is an a and the second a b, we must 
apply the production S — ab. Otherwise, the rule S — aSb must be used. 


We say that a grammar is an LL (k) grammar if we can uniquely identify the correct production, 
given the currently scanned symbol and a “look-ahead” of the next k — 1 symbols. Example 7.12 is an 
example of an ZL (2) grammar. 


Example 7.13 


The grammar 


S—SS laSb| ab 


generates the positive closure of the language in Example 7.12. As remarked in Example 5.4, this is 
the language of properly nested parenthesis structures. The grammar is not an LZ (k) grammar for any 
k. 


To see why this is so, look at the derivation of strings of length greater than two. To start, we 
have available two possible productions S — SS and S — aSb. The scanned symbol does not tell us 
which is the right one. Suppose we now use a look-ahead and consider the first two symbols, finding 
that they are aa. Does this allow us to make the right decision? The answer is still no, since what we 
have seen could be a prefix of a number of strings, including both aabb or aabbab. In the first case, 
we must start with S — aSb, while in the second it is necessary to use S$ — SS. The grammar is 
therefore not an LZ (2) grammar. In a similar fashion, we can see that no matter how many look-ahead 
symbols we have, there are always some situations that cannot be resolved. 


This observation about the grammar does not imply that the language is not deterministic or that 
no LL grammar for it exists. We can find an LZ grammar for the language if we analyze the reason for 
the failure of the original grammar. The difficulty lies in the fact that we cannot predict how many 


repetitions of the basic pattern a”b” there are until we get to the end of the string, yet the grammar 
requires an immediate decision. Rewriting the grammar avoids this difficulty. The grammar 


S — aSbS|X 
is an LL-grammar nearly equivalent to the original grammar. 
To see this, consider the leftmost derivation of w = abab. Then 
S => aSbS = abS = abaSbS => ababS = abab. 


We see that we never have any choice. When the input symbol examined is a, we must use S — aSbS, 
when the symbol is b or we are at the end of the string, we must use S — À. 


But the problem is not yet completely solved because the new grammar can generate the empty 
string. We fix this by introducing a new start variable S, and a production to ensure that some 


nonempty string is generated. The final result 


So E: s aSbs. 


S a Oe SbS| A 


is then an LL-grammar equivalent to the original grammar. 


While this informal description of LL grammars is adequate for understanding simple examples, 
we need a more precise definition if any rigorous results are to be developed. We conclude our 
discussion with such a definition. 


Definition 7.5 


Let G = (V, T, S, P) be a context-free grammar. If for every pair of leftmost derivations 


oa * ¥ 
S > wi Arı > wiyiri > wiw, 


Y + $ 
S > wyArg > WYT? > wyws, 


with wj, w>,w3 € T*, the equality of the k leftmost symbols of w, and w3 implies y, =>, then G is said 
to be an LL (k) grammar. (If |w, | or |w3| is less than k, then k is replaced by the smaller of these.) 


The definition makes precise what has already been indicated. If at any stage in the leftmost 
derivation (w,Ax) we know the next k symbols of the input, the next step in the derivation is uniquely 


determined (as expressed by yı = y2). 


The topic of LL grammars is an important one in the study of compilers. A number of 
programming languages can be defined by LL grammars, and many compilers have been written using 
LL parsers. But LL grammars are not sufficiently general to deal with all deterministic context-free 
languages. Consequently, there is interest in other, more general deterministic grammars. Particularly 
important are the so-called ZR grammars, which also allow efficient parsing, but can be viewed as 
constructing the derivation tree from the bottom up. There is a great deal of material on this subject 
that can be found in books on compilers (e.g., Hunter 1981) or books specifically devoted to parsing 
methods for formal languages (such as Aho and Ullman 1972). 


EXERCISES 


1. Show that the second grammar in Example 7.13 is an LL grammar and that it is equivalent to the 
original grammar. 


. Show that the grammar for L = {w : n, (w) =n, (w)} given in Example 1.13 is not an LL grammar. 
. Find an LL grammar for the language in Exercise 2. 

. Construct an LZ grammar for the language L (a*ba) U L (abbb*). 

. Show that any LL grammar is unambiguous. 

. Show that if Gis an LL (k) grammar, then L (G) is a deterministic context-free language. 


. Show that a deterministic context-free language is never inherently ambiguous. 


O lN A a Aà & N 


. Let G be a context-free grammar in Greibach normal form. Describe an algorithm which, for any 
given k, determines whether or not G is an LL (k) grammar. 


9. Give LL grammars for the following languages, assuming È = {a,b, c}. 
{ npma -n >0,m> 0}. 
= fa nenn > 0m 20k 
(ec) L= fa” peram :n >0,m> I}. 


W : Na (w) < Nne (w)}. 


W : Na (Ww) + no (w) £ Nne lw)}. 


Chapter 8 


Properties of 
Context-Free 
Languages 


he family of context-free languages occupies a central position in a hierarchy of formal 
languages. On the one hand, context-free languages include important but restricted 
language families such as regular and deterministic context-free languages. On the other 
hand, there are broader language families of which context-free languages are a special 
case. To study the relationship between language families and to exhibit their similarities 
and differences, we investigate characteristic properties of the various families. As in Chapter 4, we 
look at closure under a variety of operations, algorithms for determining properties of members of the 
family, and structural results such as pumping lemmas. These all provide us with a means of 
understanding relations between the different families as well as for classifying specific languages in 
an appropriate category. 


8.1 Two Pumping Lemmas 


The pumping lemma given in Theorem 4.8 is an effective tool for showing that certain languages are 
not regular. Similar pumping lemmas are known for other language families. Here we will discuss 
two such results, one for context-free languages in general, the other for a restricted type of context- 
free language. 


A Pumping Lemma for Context-Free Languages 


Theorem 8.1 


Let L be an infinite context-free language. Then there exists some positive integer m such that any œ € 
L with |o| > m can be decomposed as 


w= uvryz, (8.1) 


with 


vry| < m, (8.2) 


and 


lvy| > 1, (8.3) 
such that 
uv'ry’z EL, (8.4) 


for all i= 0,1, 2,.... This is known as the pumping lemma for context-free languages. 
Proof: Consider the language L - {A}, and assume that we have for it a grammar G without unit- 
productions or A-productions. Since the length of the string on the right side of any production is 
bounded, say by k, the length of the derivation of any œ £ L must be at least |w|/k. Therefore, since L is 
infinite, there exist arbitrarily long derivations and corresponding derivation trees of arbitrary height. 
Consider now such a high derivation tree and some sufficiently long path from the root to a leaf. 
Since the number of variables in G is finite, there must be some variable that repeats on this path, as 
shown schematically in Figure 8.1. Corresponding to the derivation tree in Figure 8.1, we have the 
derivation 


~ * z$ * 
S > uAz > wAyz > uvryz, 


where u, v, x, y, and z are all strings of terminals. From the above we see that and A > vAYand 
A = T, go all the strings uv‘xyz, = =0, 1,2 can be generated by the grammar and are therefore in L. 


ne in the derivations 4 = VAY and A = 2,we can assume that no variable repeats. To 
see this, look at the sketch of the derivation tree in Nar 8.1. In the subtree T; no variable repeats; 
otherwise we could just apply the argument to this repeating variable. Similarly, we can assume that 
no variable repeats in the subtrees 7; and T4. Therefore, the lengths of the strings v x, and y depend 
only on the productions of the grammar and can be bounded independently of w so that (8.2) holds. 
Finally, since there are no unit-productions and no A-productions, v and y cannot both be empty 
strings, giving (8.3). 


Figure 8.1 
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This completes the argument that (8.1) to (8.4) hold. m 
es 


This pumping lemma is useful in showing that a language does not belong to the family of context- 
free languages. Its application is typical of pumping lemmas in general; they are used negatively to 
show that a given language does not belong to some family. As in Theorem 4.8, the correct argument 
can be visualized as a game against an intelligent opponent. But now the rules make it a little more 
difficult for us. For regular languages, the substring xy whose length is bounded by m starts at the left 
end ofw. Therefore, the substring y that can be pumped is within m symbols of the beginning ofw. 
For context-free languages, we only have a bound on |vxy|. The substring u that precedes vxy can be 
arbitrarily long. This gives additional freedom to the adversary, making arguments involving Theorem 
8.1 a little more complicated. 


Example 8.1 


Show that the language 
L={a"e'c”: n> 0} 


is not context-free. 


Once the adversary has chosen m, we pick the string abc”, which is in L. The adversary now 
has several choices. If he chooses vxy to contain only a's, then the pumped string will obviously not 
be in L. If he chooses a decomposition so that v and y are composed of an equal number of a's and b's, 


then the pumped string a”b”c™” with k # m can be generated, and again we have generated a string not 


in L. In fact, the only way the adversary could stop us from winning is to pick vxy so that vy has the 
same number of a’s, b’s, and c’s. But this is not possible because of restriction (8.2). Therefore, L is 
not context-free. 


If we try the same argument on the language L = {a”b” we fail, as we must, since the language is 
context-free. If we pick any string in L, such as w = ab” the adversary can pick v = a* and y = b. 
Now, no matter what i we choose, the resulting pumped string w; is in L. Remember, though, that this 


does not prove that L is context-free; all we can say is that we have been unable to get any conclusion 
from the pumping lemma. That L is context-free must come from some other argument, such as the 
construction of a context-free grammar. 


The argument also justifies a claim made in Example 7.11 and allows us to close a gap in that 
example. The language 


= . i l 
T = fa” b” | LJ fap \ U fa” b” c” | 


is not context-free. The string a™b™c™ is in L, but the pumped result is not. 


Example 8.2 


Consider the language 
L= { ww :w E {a,b} f ; 


Although this language appears to be very similar to the context-free language of Example 5.1, it is 
not context-free. 


Take the string 
a”? ai a 


There are many ways in which the adversary can now pick vxy, but for all of them we have a winning 
countermove. For example, for the choice in Figure 8.2, we can use 7 = 0 to get a string of the form 


a*ba™b™ k <m or L M 


which is not in L. For other choices by the adversary, similar arguments can be made. We conclude 
that L is not context-free. 


Figure 8.2 


Example 8.3 


Show that the language 


is not context-free. 


Given the opponent's choice for m, we pick a = a”! Obviously, whatever the decomposition is, it 
must be of the form v = a‘, y = a’. Then wọ = uxz has length m! — (k + 1). This string is in L only if 
m!—(k+l)=j! 
for some j. But this is impossible, since with k + l < m, 
m—(k+l)>(m—-1)!. 


Therefore, the language is not context-free. 


Example 8.4 


Show that the language 


L= {a"b n= 7°} 
is not context-free. 

Given m in Theorem 8.1, we pick as our string a” b" . The adversary now has several choices. 
The only one that requires much thought is the one shown in Figure 8.3. Pumping i times will yield a 
new string with m? + (i- 1) k, a's and m + (i- 1) k, b's. If the adversary takes k; 40, k,4 0, we can 
pick i = 0. Since the result is not in L. If the opponent picks k; =0, k, 40 or k; +0, k, = 0, then again 


withi = 0, the pumped string is not in L. We can conclude from this that L is not a context-free 
language. 


~ a avg 
(m — k2) < (m—1) 

9 s 

= m“ — 2m + 1 


9 
< m* — ki, 


Figure 8.3 
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A Pumping Lemma for Linear Languages 


We previously made a distinction between linear and nonlinear context-free grammars. We now make 
a similar distinction between languages. 


Definition 8.1 


A context-free language L is said to be linear if there exists a linear context-free grammar G such that 
L=L (G). 


Clearly, every linear language is context-free, but we have not yet established whether or not the 
converse is true. 


Example 8.5 


The language L = {a"b” : n > 0} is a linear language. A linear grammar for it is given in Example 
1.11. The grammar given in Example 1.13 for the language L = {w : n (w)= np (w)} is not linear, so 


the second language is not necessarily linear. 


Of course, just because a specific grammar is not linear does not imply that the language 
generated by it 1s not linear. If we want to prove that a language is not linear, we must show that there 
exists no equivalent linear grammar. We approach this in the usual way, establishing structural 
properties for linear languages, then showing that some context-free languages do not have a required 


property. 


Theorem 8.2 


Let L be an infinite linear language. Then there exists some positive integer m, such that any œ e L 
with || > m can be decomposed as w = uvxyz with 


juvyz| < m, (8.5) 


jvy| re (5.0) 


such that 


~J 


for all i =0, 1, 2,.... 

Note that the conclusions of this theorem differ from those of Theorem 8.1, since (8.2) is replaced 
by (8.5). This implies that the strings v and y to be pumped must now be located within m symbols of 
the left and right ends of w, respectively. The middle string x can be of arbitrary length. 


Proof: Since the language is linear there exists a linear grammar G for it. For the argument it is 
convenient to assume that G has no unit-productions and no A-productions. An examination of the 
proofs of Theorem 6.3 and 6.4 makes it clear that removing unit-productions and -productions does 
not destroy the linearity of the grammar. We can therefore assume that G has the required property. 


Consider now the derivation of a string œ e L(G) 


+ 


S Š uAz Š uvAyz > uvryz = w. 
Assume, for the moment, that for every w G L(G), there is a variable A, such that 
1.in the partial derivation S È wAz no variable is repeated, 


2.in the partial derivation 5 = uAz = uvAyz no variable except A is repeated, 


3.the repetition of A must occur in the first m steps, where m can depend on the grammar, but not on 
w. 
If this is true, then the lengths of u, v, y, z must be bounded independent of w. This in turn implies 
that (8.5), (8.6), and (8.7) must hold. 


To complete the argument, we must still demonstrate that the above conditions hold for every 
linear grammar. This is not hard to see if we look at sequences in which the variables can occur. We 
will omit the details here, but leave them as an exercise (see Exercise 16 at the end of this section).= 


—— 
Example 8.6 


The language 
L={w:n,(w) =n, (w)} 


is not linear. 
To show this, assume that the language is linear and apply Theorem 8.2 to the string 
w = aq™ pom a™. 


Inequality (8.5) shows that in this case the strings u, v, y, z must all consist entirely of a's. If we pump 
this string once, we geta”**b?"a"*! with eitherk > 1 orl >1, a result that is not in L. This 
contradiction of Theorem 8.2 proves that the language is not linear. 


This example answers the general question raised on the relation between the families of context- 
free and linear languages. The family of linear languages is a proper subset of the family of context- 
free languages. 


EXERCISES 


1. Show that the language 
L={we {a, b, eF nal w) = now) < nel w)} 
is not context-free. 


2. Show that the language L = {a” : n is a prime number} is not context-free. 
3. Show that L = {ww"w : w € {a, bF } is nota context-free language. 
4. Show that L = {w € {a,b,c} : na (w) + ng (w) = nè (w) } is not context-free. 
5. Is the language L = {fa"b” : n = 2™”} context-free? 
6. Show that the language L = fa" : n > 0} is not context-free. 
7. Show that the following languages on È = fa, b, c} are not context-free. 
(a) L= {a"b : n <j}. 
(b) L = {a"b : n> G - 1)7}. 
(c) L= {a"bick : k =jn}. 
(d) L = fa"bick : k>n, k >j}. 
(e) L= fa"bick : n<j,n <k <j}. 
(Ð L= {w: n (w) <n w) < 1, w)}. 
(8) L = {w : na (w) /np(w)= n, (w)}. 
(h) L = {we {a,b,c} * : ng (w) + n, (w) = 2n,(w),ng(w)= m w)}. 
(i) L = {a,b,, : n and m are both prime}. 
(j) L = {a„bm : n is prime or m is prime}. 
(k) L = {a„bm: n is prime and m is not prime}. 
8. Determine whether or not the following languages are context-free. 
(a) L={a"wwka" : n > 0, we {a,b} *} 
(b) L= fa"b/a"h/ : n >0,j > 0}. 


(c)L= fa"b/d/b" :n>=0, j > 0}. 
(d)L= fa"bia*b! sn +j <k+ Up. 
(e)L= fa"b/akb! :n<k,j <D. 
(HL= fa"b"d :n<}. 
(g)L= {we {a, b, c}* : ngw)= ny (w)=2n,(w)}. 
9. In Theorem 8.1, find a bound for m in terms of the properties of the grammar G. 


10. Determine whether or not the following language is context-free. 
L = {wicw2 : wi, w2 E {a,b} ,wı A w2}. 


11. Show that the language L = {a”b”"a”b™ : n > 0,m = 0} is context-free but not linear. 


12. Show that the following language is not linear. 
L={w:na(w) > no w)}. 


13. Show that the language 4 = {w € {a,6,c}" : ma (w) +n» (w) = ne (w)} is context-free, but not linear. 
14. Determine whether or not the language 4 = {0b :j <n < 2j — 1} is linear. 
15. Determine whether or not the language in Example 5.12 is linear. 


16. Let G be a linear grammar with k variables. Show that when we write any sequence of variables 
there must be some variable A that repeats so that 


(a) the first occurrence of A must be in position p < k, 
(b) the repetition of A must be no later than q < k +1, and 
(c) there can be no other repeating variable between positions p and q. 


17. Justify the claim made in Theorem 8.2 that for any linear language (not containing À) there exists 
a linear grammar without A-productions and unit-productions. 


18. Consider the set of all strings a/b, where a and b are positive decimal integers such that a < b. 
The set of strings then represents all possible decimal fractions. Determine whether or not this is 
a context-free language. 
*19. Show that the complement of the language in Exercise 6 is not context-free. 
20. Is the language L = {a”” : n and m are prime numbers} context-free? 
*21. It is known that the language 
ES faite” iF my} 


is not context-free. (See the next exercise.) Show that, in spite of this, it is not possible to use 
Theorem 8.1 to prove it. 


22. Ogden's lemma is an extension of Theorem 8.1 that necessitates some changes in the way the 
pumping lemma game is played. In particular, 
(a) You can choose any w e L with |w| > m, but you must mark at least m symbols in w. You can 
choose which symbols to mark. 


(b) The opponent must select the decomposition w = uvxyz with the additional restriction that 
either vx or xy must have at least one marked position. 


Notice that Theorem 8.1 is a special case of Ogden's lemma in which all symbols of @ are marked. 
Show how Ogden's lemma can be used to prove that the language in the previous exercise is not 
context-free and conclude from this that Ogden's lemma is more powerful than Theorem 8.1 


$.2 Closure Properties and Decision Algorithms for Context-Free 
Languages 


In Chapter 4 we looked at closure under certain operations and algorithms to decide on the properties 
of the family of regular languages. On the whole, the questions raised there had easy answers. When 
we ask the same questions about context-free languages, we encounter more difficulties. First, closure 
properties that hold for regular languages do not always hold for context-free languages. When they 
do, the arguments needed to prove them are often quite complicated. Second, many intuitively simple 
and important questions about context-free languages cannot be answered. This statement may seem at 
first surprising and will need to be elaborated as we proceed. In this section, we provide only a 
sample of some of the most important results. 


Closure of Context-Free Languages 


Theorem 8.3 


The family of context-free languages is closed under union, concatenation, and star-closure. 

Proof: Let L, and L, be two context-free languages generated by the context-free grammars G; = (V;, 
T,, S;, Pı) and G, = (V>, T>, S>, P>), respectively. We can assume without loss of generality that the 
sets V, and V, are disjoint. 


Consider now the language L (G3), generated by the grammar 
G3 = (Vi U V2 U {S3} , Ti U Ta, S3, P3), 


where S; is a variable not in V7 u V2. The productions of G; are all the productions of G; and G,, 


together with an alternative starting production that allows us to use one or the other grammars. More 
precisely 


P = Py U Pa U {83 — S1 |852}. 


Obviously, G3 is a context-free grammar, so that L (G;) is a context-free language. But it is easy to 
see that 


L (G3) = Lı U L2. (8.8) 
Suppose, for instance, that œ ¢ L,. Then 
S3 => S1 5 w 


is a possible derivation in grammar G;. A similar argument can be made for œ e L. Also, if œ e L 
(G3), then either 


S3 => Sy (8.9) 


S3 => So (8.10) 


ó 


must be the first step of the derivation. Suppose (8.9) is used. Since sentential forms derived from S, 
have variables in V}, and V, and V, are disjoint, the derivation 


can involve productions in P,;only. Hence œ must be in L1. Alternatively, if (8.10) is used first, then 
œw must be in L, and it follows that L (G3) is the union of L,and L3. 


Next, consider 
G4 = (V U VU {S4}; T1 UT2, S4, Pa). 
Here again S4 is a new variable and 
Py = P, U Pz U {S4 > S159}. 
Then 
L (G4) = L (G1) L (G2) 


follows easily. 
Finally, consider L (Gs) with 


Gs = (V1 U {S5}, T1, S5, Ps), 
where S; is a new variable and 


Ps = Py U {55 — S1S5|À}. 


Then 
L (Gs ) = iF (Gi \* l 


Thus we have shown that the family of context-free languages is closed under union, 
concatenation, and star-closure.m 


Theorem 8.4 


The family of context-free languages is not closed under intersection and complementation. 


Proof: Consider the two languages 


Li _ a"b"c™ =n > 0.m > 0 l 


and 
Ln = fa"b™c™ n > 0, m > 0} å 


There are several ways one can show that L, and L, are context-free. For instance, a grammar for L; 


1S 
Si SiS, 
Sy — aSıblà, 
So — CSA. 


Alternatively, we note that L; is the concatenation of two context-free languages, so it is context-free 
by Theorem 8.3. But 


In iLe= fate” in = Dt. 


which we have already shown not to be context-free. Thus, the family of context-free languages is not 
closed under intersection. 


The second part of the theorem follows from Theorem 8.3 and the set identity 


Li N Lə = Lı U La. 


If the family of context-free languages were closed under complementation, then the right side of the 
above expression would be a context-free language for any context-free L} and L). But this 
contradicts what we have just shown, that the intersection of two context-free languages is not 
necessarily context-free. Consequently, the family of context-free languages is not closed under 
complementation.m 

While the intersection of two context-free languages may produce a language that is not context- 
free, the closure property holds if one of the languages is regular. 


Theorem 8.5 


Let L; be a context-free language and L, be a regular language. Then L; n L, is context-free. 


~ p ga oe \ Mə = Í YS) ĝo. Dr í 
Proof: Let 4/1 = (Q, X, I, ð1, qo, 2, F1) be an npda that accepts M2 = (P, }, ò2, po, F2) be a dfa 


M = (GETA) a 3: 
that accepts L,. We construct a push-down automaton ” “aan that simulates the 
parallel action of M, and M, : Whenever a symbol is read from the input string, M simultaneously 


executes the moves of M; and M, . To this end we let 


go = ( qo, po) ; 
P= Pex Fy 


and define ô such that 
((qx, pi), £) € $ ((qi, pj) ,@, b) 
if and only if 
(dk, £) € 61 (q;,a, 5) 
and 
69 (pj, a) = pi. 


In this, we also require that if a =A, then p, = p}. In other words, the states of Mare labeled with pairs 
(qi pj), representing the respective states in which M; and M, can be after reading a certain input 
string. Itis a straightforward induction argument to show that 


((qo; po); w,z) Fallar Ps), À, T), 


with q, € F; and P, € F if and only if 


* 
(go, w, z) Fi (g-,A,2), 
and 
ô” (po, w) = Dg. 


Therefore, a string is accepted by ÄM if and only if it is accepted by M, and M,, that is, if it is in L 
(M,) nL (M,)= Li N L3. E 


The property addressed by this theorem is called closure under regular intersection. Because of 
the result of the theorem, we say that the family of context-free languages is closed under regular 
intersection. This closure property is sometimes useful for simplifying arguments in connection with 
specific languages. 


Example 8.7 


Show that the language 
L=4a"d™ :n>0;n + 100} 


is context-free. 


It is possible to prove this claim by constructing a pda or a context-free grammar for the language, 
but the process is tedious. We can get a much neater argument with Theorem 8.5. 


Let 
Ly = (ig BOE} i 


Then, because L; is finite, it is regular. Also, it is easy to see that 


L= {a"b" : n > 0} AL. 


Therefore, by the closure of regular languages under complementation and the closure of context-free 
languages under regular intersection, the desired result follows. 


Example 8.8 


Show that the language 


L= fw E ja, b,c} [Na (Ww) = ny (w) = Nng | w)} 


is not context-free. 


The pumping lemma can be used for this, but again we can get a much shorter argument using 
closure under regular intersection. Suppose that L were context-free. Then 


LBNL) = rere rn: SO} 
would also be context-free. But we already know that this is not so. We conclude that L is not 


context-free. 


Closure properties of languages play an important role in the theory of formal languages and many 
more closure properties for context-free languages can be established. Some additional results are 
explored in the exercises at the end of this section. 


Some Decidable Properties of Context-Free Languages 


By putting together Theorems 5.2 and 6.6, we have already established the existence of a membership 
algorithm for context-free languages. This is of course an essential feature of any language family 
useful in practice. Other simple properties of context-free languages can also be determined. For the 
purpose of this discussion, we assume that the language is described by its grammar. 


Theorem 8.6 


Given a context-free grammar G =(V.7S,P) , there exists an algorithm for deciding whether or not L 
(G) is empty. 

Proof: For simplicity, assume that à ÉL (G). Slight changes have to be made in the argument if this is 
not so. We use the algorithm for removing useless symbols and productions. If S is found to be 
useless, then L (G) is empty; if not, then Z (G) contains at least one element. m 


Theorem 8.7 


Given a context-free grammar G =(V; T; S, P) , there exists an algorithm for determining whether or 
not L (G) is infinite. 
Proof: We assume that G contains no A-productions, no unit-productions, and no useless symbols. 


Suppose the grammar has a repeating variable in the sense that there exists some A € V for which 
there is a derivation 


A= rAy. 


Since G is assumed to have no A-productions and no unit-productions,x andy cannot be 
simultaneously empty. Since A is neither nullable nor a useless symbol, we have 


ya * 
S > uAv => w 


where u, v, and z are in T*. But then 
S Š uAv Š ur" Ay"v > uzr”zy"v 


is possible for all n, so that L (G) is infinite. 


If no variable can ever repeat, then the length of any derivation is bounded by |V]. In that case, L 
(G) is finite. 
Thus, to get an algorithm for determining whether L (G) is finite, we need only to determine 


whether the grammar has some repeating variables. This can be done simply by drawing a 
dependency graph for the variables in such a way that there is an edge (A, B) whenever there is a 
corresponding production 


A— rBy. 


Then any variable that is at the base of a cycle is a repeating one. Consequently, the grammar has a 
repeating variable if and only if the dependency graph has a cycle. 


Since we now have an algorithm for deciding whether a grammar has a repeating variable, we 
have an algorithm for determining whether or not L (G) is infinite. m 


Somewhat surprisingly, other simple properties of context-free languages are not so easily dealt 
with. As in Theorem 4.7, we might look for an algorithm to determine whether two context-free 
grammars generate the same language. But it turns out that there is no such algorithm. For the moment, 
we do not have the technical machinery for properly defining the meaning of “there is no algorithm,” 
but its intuitive meaning is clear. This is an important point to which we will return later. 


EXERCISES 


. Is the complement of the language in Example 8.8 context-free? 

. Consider the language L; in Theorem 8.4. Show that this language is linear. 

. Show that the family of context-free languages is closed under homomorphism. 
. Show that the family of linear languages 1s closed under homomorphism. 

. Show that the family of context-free languages is closed under reversal. 


. Which of the language families we have discussed are not closed under reversal? 


NA a A UU N m 


. Show that the family of context-free languages is not closed under difference in general, but is 
closed under regular difference, that is, if L; is context-free and L) is regular, then L; — L, is 
context-free. 


8. Show that the family of deterministic context-free languages is closed under regular difference. 
9. Show that the family of linear languages is closed under union, but not closed under concatenation. 
10. Show that the family of linear languages 1s not closed under intersection. 


11. Show that the family of deterministic context-free languages is not closed under union and 
intersection. 


12. Give an example of a context-free language whose complement is not context-free. 


*13. Show that if L; is linear and L, is regular, then LL, is a linear language. 


14. Show that the family of unambiguous context-free languages is not closed under union. 

15. Show that the family of unambiguous context-free languages is not closed under intersection. 

16. Let L be a deterministic context-free language and define a new language L; ={w:aweL, ae 
»}. Is it necessarily true that L; is a deterministic context-free language? 

17. Show that the language L = {a"b" : n =0, n is not a multiple of 5} is context-free. 


18. Show that the following language is context-free. 
L= {we fa, b}*:n, (w)= n, (w); w does not contain a substring aab}. 


19. Is the family of deterministic context-free languages closed under homomorphism? 
20. Give the details of the inductive argument in Theorem 8.5. 


21. Give an algorithm which, for any given context-free grammar G, can determine whether or not A e 
L(G). 


22. Show that there exists an algorithm to determine whether the language generated by some context- 
free grammar contains any words of length less than some given number n. 


23. LetL, be a context-free language and L, be regular. Show that there exists an algorithm to 
determine whether or not L; and L, have a common element. 


Chapter 9 


Turing 
Machines 


n our discussion so far we have encountered some fundamental ideas, in particular the 
concepts of regular and context-free languages and their association with finite automata and 
l pushdown accepters. Our study has revealed that the regular languages form a proper subset of 
the context-free languages and, therefore, that pushdown automata are more powerful than 
finite automata. We also saw that context-free languages, while fundamental to the study of 
programming languages, are limited in scope. This was made clear in the last chapter, where our 
results showed that some simple languages, such as {a”b”c”}and {ww}, are not context-free. This 
prompts us to look beyond context-free languages and investigate howone might define newlanguage 
families that include these examples. To do so, we return to the general picture of an automaton. If we 
compare finite automata with pushdown automata, we see that the nature of the temporary storage 
creates the difference between them. If there is no storage, we have a finite automaton; if the storage 
is a Stack, we have the more powerful pushdown automaton. Extrapolating from this observation, we 
can expect to discover even more powerful language families if we give the automaton more flexible 
storage. For example, what would happen if, in the general scheme of Figure 1.3, we used two stacks, 
three stacks, a queue, or some other storage device? Does each storage device define a newkind of 
automaton and through it a newlanguage family? This approach raises a large number of questions, 
most of which turn out to be uninteresting. It is more instructive to ask a more ambitious question and 
consider howfar the concept of an automaton can be pushed. What can we say about the most 
powerful of automata and the limits of computation? This leads to the fundamental concept of a 
Turing machine and, in turn, to a precise definition of the idea of a mechanical or algorithmic 
computation. 


We begin our study with a formal definition of a Turing machine, then develop some feeling for 
what is involved by doing some simple programs. Next we argue that, while the mechanism of a 
Turing machine is quite rudimentary, the concept is broad enough to cover very complex processes. 
The discussion culminates in the Turing thesis, which maintains that any computational process, such 
as those carried out by present-day computers, can be done on a Turing machine. 


9.1 The Standard Turing Machine 


Although we can envision a variety of automata with complex and sophisticated storage devices, a 
Turing machine's storage is actually quite simple. It can be visualized as a single, one-dimensional 
array of cells, each of which can hold a single symbol. This array extends indefinitely in both 
directions and is therefore capable of holding an unlimited amount of information. The information 
can be read and changed in any order. We will call such a storage device a tape because it is 


analogous to the magnetic tapes used in older computers. 


Definition of a Turing Machine 


A Turing machine is an automaton whose temporary storage is a tape. This tape is divided into cells, 
each of which is capable of holding one symbol. Associated with the tape is a read-write head that 
can travel right or left on the tape and that can read and write a single symbol on each move. To 
deviate slightly from the general scheme of Chapter 1, the automaton that we use as a Turing machine 
will have neither an input file nor any special output mechanism. Whatever input and output is 
necessary will be done on the machine's tape. We will see later that this modification of our general 
model in Section 1.2 is of little consequence. We could retain the input file and a specific output 
mechanism without affecting any of the conclusions we are about to draw, but we leave them out 
because the resulting automaton is a little easier to describe. 


A diagram giving an intuitive visualization of a Turing machine is shown in Figure 9.1. Definition 
9.1 makes the notion precise. 


Figure 9.1 
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Definition 9.1 


A Turing machine M is defined by 
M = (Q,2,1°,0,qq0LF), 
where 
Q is the set of internal states, 


x is the input alphabet 


T is the finite set of symbols called the tape alphabet, 


Ò is the transition function, 

Oer is a special symbol called the blank, 
qo € QO is the initial state, 

FEO is the set of final states. 


In the definition of a Turing machine, we assume that È © F — {0}, that is, that the input alphabet 
is a subset of the tape alphabet, not including the blank. Blanks are ruled out as input for reasons that 
will become apparent shortly. The transition function 6 is defined as 


8:0xT>Oxr x {LR}. 


In general,6 is a partial function on Q x T; its interpretation gives the principle by which a Turing 
machine operates. The arguments of 6 are the current state of the control unit and the current tape 
symbol being read. The result is a new state of the control unit, a new tape symbol, which replaces 
the old one, and a move symbol, L or R. The move symbol indicates whether the read-write head 
moves left or right one cell after the new symbol has been written on the tape. 


Example 9.1 


Figure 9.2 shows the situation before and after the move 
ô (qo, 2) = (q1, d, R). 


Figure 9.2 


Internal state A Internal state q; 


The situation (a) before the move and (b) after the move. 


We can think of a Turing machine as a rather simple computer. It has a processing unit, which has 
a finite memory, and in its tape, it has a secondary storage of unlimited capacity. The instructions that 
such a computer can carry out are very limited: It can sense a symbol on its tape and use the result to 
decide what to do next. The only actions the machine can perform are to rewrite the current symbol, 


to change the state of the control, and to move the read-write head. This small instruction set may 
seem inadequate for doing complicated things, but this 1s not so. Turing machines are quite powerful 
in principle. The transition function 6 defines howthis computer acts, and we often call it the 
“program” of the machine. 


As always, the automaton starts in the given initial state with some information on the tape. It then 
goes through a sequence of steps controlled by the transition function 6. During this process, the 
contents of any cell on the tape may be examined and changed many times. Eventually, the whole 
process may terminate, which we achieve in a Turing machine by putting it into a halt state. A Turing 
machine is said to halt whenever it reaches a configuration for which 6 is not defined; this is possible 
because 6 is a partial function. In fact, we will assume that no transitions are defined for any final 
state, so the Turing machine will halt whenever it enters a final state. 


Example 9.2 


Consider the Turing machine defined by 
Q= {do U5: 
x = {a, b}, 
T= {a, b, O}, 
PS iis 
and 
ô (qo 4)= (qo, b, R), 
ô (qdo 5)= (qo b, R), 
ô (do, = (q1, D, L), 


If this Turing machine is started in state gy with the symbol a under the read-write head, the 
applicable transition rule is ò (qọ.a)= (go,b,R). Therefore, the read-write head will replace the a with 
a b, then move right on the tape. The machine will remain in state gy. Any subsequent a will also be 
replaced with a b, but b's will not be modified. When the machine encounters the first blank, it will 
move left one cell, then halt in final state q4. 


Figure 9.3 shows several stages of the process for a simple initial configuration. 


Figure 9.3 
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A sequence of moves. 


As before, we can use transition graphs to represent Turing machines. Now we label the edges of 
the graph with three items: the current tape symbol, the symbol that replaces it, and the direction in 
which the read-write head is to move. The Turing machine in Example 9.2 is represented by the 
transition graph in Figure 9.4. 


Figure 9.4 
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Example 9.3 


Look at the Turing machine in Figure 9.5. To see what will happen, we can trace a typical case. 
Suppose that the tape initially contains ab..., with the read-write head on the a. The machine then 
reads the a, but does not change it. Its next state is q} and the read-write head moves right, so that it is 
now over the b. This symbol is also read and left unchanged. The machine goes back into state gp) and 
the read-write head moves left. We are now back exactly in the original state, and the sequence of 
moves starts again. It is clear from this that the machine, whatever the initial information on its tape, 
will run forever, with the read-write head moving alternately right then left, but making no 
modifications to the tape. This is an instance of a Turing machine that does not halt. In analogy with 
programming terminology, we say that the Turing machine is in an infinite loop. 


Figure 9.5 
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Since one can make several different definitions of a Turing machine, it is worthwhile to 
summarize the main features of our model, which we will call a standard Turing machine: 


1. The Turing machine has a tape that is unbounded in both directions, allowing any number of left 
and right moves. 


2. The Turing machine is deterministic in the sense that 6 defines at most one move for each 
configuration. 


3. There is no special input file. We assume that at the initial time the tape has some specified 
content. Some of this may be considered input. Similarly, there is no special output device. 
Whenever the machine halts, some or all of the contents of the tape may be viewed as output. 


These conventions were chosen primarily for the convenience of subsequent discussion. In 
Chapter 10, we will look at other versions of Turing machines and discuss their relation to our 
standard model. 


Here, as in the case of pda's, the most convenient way to exhibit a sequence of configurations of a 
Turing machine uses the idea of an instantaneous description. Any configuration is completely 
determined by the current state of the control unit, the contents of the tape, and the position of the 
read-write head. We will use the notation in which 


X1qX2 


aa. f Ap_jYApaziy- h a, 


is the instantaneous description of a machine in state g with the tape depicted in Figure 9.6. The 
symbols aj,..., a„ Show the tape contents, while q defines the state of the control unit. This convention 


is chosen so that the position of the read-write head is over the cell containing the symbol 
immediately following q. 


Figure 9.6 


Internal state g 


The instantaneous description gives only a finite amount of information to the right and left of the 
read-write head. The unspecified part of the tape is assumed to contain all blanks; normally such 
blanks are irrelevant and are not shown explicitly in the instantaneous description. If the position of 
blanks is relevant to the discussion, however, the blank symbol may appear in the instantaneous 
description. For example, the instantaneous description gua indicates that the read-write head is on 
the cell to the immediate left of the first symbol of w and that this cell contains a blank. 


Example 9.4 


The pictures drawn in Figure 9.3 correspond to the sequence of instantaneous descriptions qoaa, 
bqoa, bbq, bq. 


A move from one configuration to another will be denoted by}. Thus, if 
ò (qi; C)=(4>, e, R), 
then the move 


abq,cd F abeq,d 


is made whenever the internal state is q4, the tape contains abcd, and the read-write head is on the c. 


The symbol F has the usual meaning of an arbitrary number of moves. Subscripts, such as Fa» are 
used in arguments to distinguish between several machines. 


Example 9.5 


The action of the Turing machine in Figure 9.3 can be represented by 


qoaa F bgga F bbggQO F bqib 


qoaa F bqıb. 


For further discussion, it is convenient to summarize the various observations just made in a 
formal way. 


Definition 9.2 


Let M= (Q, XI°,0,¢9,, 0,/) be a Turing machine. Then any string a;...a,_1914;,.4;41---d,, With a; € I 
and q; € Q, is an instantaneous description of M. A move 


Q1°** Ap-141 Op On41°*: an H Fy aE Ap—bqnan41 "*' Gy 


is possible if and only if 
ô (91,44) = (q2b,R). 


A move 


Q1 +++ Qk-141AkAk41°+-An F a1 --+ qoap—1bap41--- an 


is possible if and only if 
ô (41:44) = (42,6.L). 


M is said to halt starting from some initial configuration xq x if 


+ 


T1qiTa F YG; aye 


for any q; and a, for which 6 (q;,a) is undefined. The sequence of configurations leading to a halt state 
will be called a computation. 


Example 9.3 shows the possibility that a Turing machine will never halt, proceeding in an endless 
loop from which it cannot escape. This situation plays a fundamental role in the discussion of Turing 
machines, so we use a special notation for it. We will represent it by indicating that, starting from the 
initial configuration x;gx5, the machine goes into a loop and never halts. 


+ 


rıqrə F æ, 


Turing Machines as Language Accepters 


Turing machines can be viewed as accepters in the following sense. A string @ is written on the tape, 
with blanks filling out the unused portions. The machine is started in the initial state gy with the read- 
write head positioned on the leftmost symbol of œ. If, after a sequence of moves, the Turing machine 
enters a final state and halts, then w is considered to be accepted. 


Definition 9.3 


Let M= (Q,%,T,6;q¢9,0,/) be a Turing machine. Then the language accepted by M is 


L(M) = fu E E+ : qow H x1qf2%2 for some gf € F, £1, £2 € w) 


This definition indicates that the input w is written on the tape with blanks on either side. The 
reason for excluding blanks from the input now becomes clear: It assures us that all the input is 
restricted to a well-defined region of the tape, bracketed by blanks on the right and left. Without this 
convention, the machine could not limit the region in which it must look for the input; no matter how 
many blanks it saw, it could never be sure that there was not some nonblank input somewhere else on 
the tape. 


Definition 9.3 tells us what must happen when w €e I’ L (M). It says nothing about the outcome for 
any other input. When w is not in L (M), one of two things can happen: The machine can halt in a 
nonfinal state or it can enter an infinite loop and never halt. Any string for which M does not halt is by 
definition not in L(M). 


Example 9.6 


For X = {0,1}, design a Turing machine that accepts the language denoted by the regular expression 


3k 


00. 
This is an easy exercise in Turing machine programming. Starting at the left end of the input, we 
read each symbol and check that it is a 0. If it is, we continue by moving right. If we reach a blank 
without encountering anything but 0, we terminate and accept the string. If the input contains a 1 
anywhere, the string is not in L(00°), and we halt in a nonfinal state. To keep track of the computation, 
two internal states Q= {qo,q,}and one final state F= {q,} are sufficient. As transition function we can 
take As long as a 0 appears under the read-write head, the head will move to the right. If at any time a 
l is read, the machine will halt in the nonfinal state go, since 6(q,1) is undefined. Note that the Turing 
machine also halts in a final state if started in state qọ on a blank. We could interpret this as 
acceptance of À, but for technical reasons the empty string is not included in Definition 9.3. 


0(go.0) = (go,0, Hy: 
ô (40,0) = (q, 0O, R). 


The recognition of more complicated languages is more difficult. Since Turing machines have a 
primitive instruction set, the computations that we can program easily in a higher-level language are 
often cumbersome on a Turing machine. Still, it is possible, and the concept is easy to understand, as 
the next examples illustrate. 


Example 9.7 


For È = {a,b}, design a Turing machine that accepts 
L= {a,b n21}. 


Intuitively, we solve the problem in the following fashion. Starting at the leftmost a, we check it off 
by replacing it with some symbol, say x. We then let the read-write head travel right to find the 
leftmost b, which in turn is checked off by replacing it with another symbol, say y. After that, we go 
left again to the leftmost a, replace it with an x, then move to the leftmost band replace it with y, and 
so on. Traveling back and forth this way, we match each a with a corresponding b. If after some time 
no a's or b's remain, then the string must be in L. 


Working out the details, we arrive at a complete solution for which Q= {99,41.42,93,44} P= {q4}, 
X= {a,b} I ={a,b, x, yO}. The transitions can be broken into several parts. The set 


ô (qo, a)=(q1; xR), 
ô (qı, M=(q1, a,R), 
ô (q1; Y)=(q1; VR), 
ô (q1, )=(92, y.R), 


replaces the leftmosta with anx, then causes the read-write head to travel right to the first b, 
replacing it with a y. When the y is written, the machine enters a state g>, indicating that an a has been 


successfully paired with a b. 


The next set of transitions reverses the direction until anx is encountered, repositions the read- 
write head over the leftmost a, and returns control to the initial state. 


ò (q2V)=(q2V;L), 
ô (42,4)=(42,4,L), 
ò (42,x)=(Gox,R), 


We are now back in the initial state go, ready to deal with the next a and b. 


After one pass through this part of the computation, the machine will have carried out the partial 
computation 


qoaa - - -abb - - -b H xgga---ayb---b, 


so that a single a has been matched with a single b. After two passes, we will have completed the 
partial computation and so on, indicating that the matching process is being carried out properly. 


qoaa -- - abb- - -b H rxrgg:::ayy:--b, 
When the input is a string a”b”, the rewriting continues this way, stopping only when there are no 


more a’s to be erased. When looking for the leftmost a, the read-write head travels left with the 
machine in state q). When an x is encountered, the direction is reversed to get the a. But now, instead 


of finding ana it will find a y. To terminate, a final check is made to see if all a’s and b’s have been 
replaced (to detect input where an a follows a b). This can be done by 


ò (qoy )=(q3V;R), 
6 (43. )=(43.R), 
6 (43) =(¢4,0,R), 


If we input a string not in the language, the computation will halt in a nonfinal state. For example, 


if we give the machine a string a”b”, with n >m, the machine will eventually encounter a blank in 
state g,. It will halt because no transition is specified for this case. Other input not in the language 


will also lead to a nonfinal halting state (see Exercise 3 at the end of this section). 
The particular input aabb gives the following successive instantaneous descriptions: 


qoaabb | xqyabb! xaq bb! rqoayb 
+ ggraybt aggaybt xaqyyb 
+ rxryqibl raqoyy F rqoryy 
F raqoyy F rryqzy F rryyq3l 
+ raryylLgsU. 


At this point the Turing machine halts ina final state, so the string aabb 1s accepted. 
You are urged to trace this program with several more strings in L, as well as with some not in L. 


Example 9.8 


Design a Turing machine that accepts 
fa Dein al}. 


The ideas used in Example 9.7 are easily carried over to this case. We match each a,b, and c by 
replacing them in order by x,y, and z, respectively. At the end, we check that all original symbols 
have been rewritten. Although conceptually a simple extension of the previous example, writing the 
actual program is tedious. We leave it as a somewhat lengthy, but straightforward exercise. Notice 
that even though {a”b”}is a context-free language and {a” b” c”} is not, they can be accepted by 
Turing machines with very similar structures. 


One conclusion we can draw from this example is that a Turing machine can recognize some 
languages that are not context-free, a first indication that Turing machines are more powerful than 
pushdown automata. 


Turing Machines as Transducers 


We have had little reason so far to study transducers; in language theory, accepters are quite 
adequate. But as we will shortly see, Turing machines are not only interesting as language accepters, 
they also provide us with a simple abstract model for digital computers in general. Since the primary 
purpose of a computer is to transform input into output, it acts as a transducer. If we want to model 
computers using Turing machines, we have to look at this aspect more closely. 


The input for a computation will be all the nonblank symbols on the tape at the initial time. At the 
conclusion of the computation, the output will be whatever is then on the tape. Thus, we can view a 


Turing machine transducer M as an implementation of a function f defined by 
w= f (w) 
provided that 


* 
gow Fm gr, 


for some final state gp. 


Definition 9.4 


A function f with domain D is said to be Turing-computable or just computable if there exists some 
Turing machine M =(Q,2,1°,6,¢9,0,/)such that 


gow Fm aff (w), ar EF, 


for all w € D. 


As we will shortly claim, all the common mathematical functions, no matter howcomplicated, are 
Turing-computable. We start by looking at some simple operations, such as addition and arithmetic 
comparison. 


Example 9.9 


Given two positive integers x and y, design a Turing machine that computes x + y. 
We first have to choose some convention for representing positive integers. For simplicity, we 
will use unary notation in which any positive integer x is represented by w(x) €{1}*, such that 


w(x)| = x. 


We must also decide how x and y are placed on the tape initially and howtheir sum is to appear at 
the end of the computation. We will assume that w(x) and w(y) are on the tape in unary notation, 
separated by a single 0, with the read-write head on the leftmost symbol of w(x). After the 
computation, w (x+ y) will be on the tape followed by a single 0, and the read-write head will be 
positioned at the left end of the result. We therefore want to design a Turing machine for performing 
the computation 


gow (x) Ow (y) u qw (x +y)0, 


where qp is a final state. Constructing a program for this is relatively simple. All we need to do is to 
move the separating 0 to the right end of w (y), so that the addition amounts to nothing more than the 


coalescing of the two strings. To achieve this, we construct M =(Q,2,.,6,¢9,0,F), withQ = 
£9009 1292593944} = tga}, and 


ô (Go, 1L)=(4o,1.R), 
ô (40,0)=(4o,1.R), 
ô (q1:1)=(41;1,R), 
ô (q1: I)=(q2,0,L), 
ô (42,1)=(43,0,L), 
ô (q3,1)=(q3,1,L), 
ô (43,0)=(94,0,R), 


Note that in moving the 0 right we temporarily create an extra 1, a fact that is remembered by putting 
the machine into state q4. The transition ò (q>,1) = (q3,0,R) is needed to remove this at the end of the 


computation. This can be seen from the sequence of instantaneous descriptions for adding 111 to 11: 
go111011 | 1gp11011 + 11q91011 + 111q9011 
F 1111q,11 | 111119q,1 + 111111q,0 
+ 11111go1 + 11119310 
k gglJ111110  q4111110. 


Unary notation, although cumbersome for practical computations, is very convenient for programming 
Turing machines. The resulting programs are much shorter and simpler than if we had used another 
representation, such as binary or decimal. 


Adding numbers is one of the fundamental operations of any computer, one that plays a part in the 
synthesis of more complicated instructions. Other basic operations are copying strings and simple 
comparisons. These can also be done easily on a Turing machine. 


Example 9.10 


Design a Turing machine that copies strings of 1’s. More precisely, find a machine that performs the 
computation 


+ 
qow F qww, 


for any w €{1}*. 


To solve the problem, we implement the following intuitive process: 
1. Replace every | by anx. 
2. Find the rightmost x and replace it with 1. 
3. Travel to the right end of the current nonblank region and create a | there. 
4. Repeat Steps 2 and 3 until there are no more x's. 
The solution is shown in the transition graph in Figure 9.7. It may be a little hard to see at first that the 


solution is correct, so let us trace the program with the simple string 11. The computation performed 
in this case is 


Figure 9.7 
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goll F tgl- rrj Akg 
F tlg F-rtail F gell 
F 1qg211 F 11q&21F 111g O 
m J1qi11+ 1q,111 
F qy1111 + gqy01111 + gg1111. 


Example 9.11 


Letx and y be two positive integers represented in unary notation. Construct a Turing machine that 
will halt in a final state q, ifx > y, and that will halt in a nonfinal state q„ ifx < y. More specifically, 


the machine is to perform the computation 


qow (x) Ow (y) F quw (x) Ow (y) ifr>y, 
gow (x) Ow (y) F qaw (xr) Ow (y) if 7 < y. 


To solve this problem, we can use the idea in Example 9.7 with some minor modifications. 
Instead of matching a's and b’s, we match each 1 on the left of the dividing 0 with the 1 on the right. 
At the end of the matching, we will have on the tape either 


xx...110xx...x O 


xx...xx0xx...x110 


depending on whetherx >y ory >x. In the first case, when we attempt to match another 1, we 
encounter the blank at the right of the working space. This can be used as a signal to enter the state q,,. 
In the second case, we still find a 1 on the right when all 1’s on the left have been replaced. We use 
this to get into the other state g,. The complete program for this is straightforward and is left as an 
exercise. 

This example makes the important point that a Turing machine can be programmed to make 
decisions based on arithmetic comparisons. This kind of simple decision is common in the machine 
language of computers, where alternate instruction streams are entered, depending on the outcome of 
an arithmetic operation. 


EXERCISES 


** 1, Write a Turing machine simulator in some higher-level programming language. Such a simulator 
should accept as input the description of any Turing machine, together with an initial 
configuration, and should produce as output the result of the computation. 


2. Design a Turing machine with no more than three states that accepts the language L(a (a + b)*). 
Assume that È = {a,b}. Is it possible to do this with a two-state machine? 


3. Determine what the Turing machine in Example 9.7 does when presented with the inputs aba and 
aaabbbb. 


4. Is there any input for which the Turing machine in Example 9.7 goes into an infinite loop? 


5. What language is accepted by the Turing machine whose transition graph is in the figure below? 


a a,a,R 
— 4 Jo Jı 
b, b, R 0,0, R 
| i aa, R 
g- 
b,b, R 


6. What happens in Example 9.10 if the string w contains any symbol other than 1? 
7. Construct Turing machines that will accept the following languages on {a, b}. 

(a) L = L(aba¥b). 

(b) L= {w : |w| 1s even}. 

(c) L = {w : w| is a multiple of 3}. 

(d) L = {a"b™: n>1, n F#m}. 

(e) L = {w: ng(w)= n w)5. 

(f) L= f{a"b"a""™™ : n> 0,m >21}. 

(g) L = {a"b"a"b" : n >0}. 

(h) L= {a"b?": n> 1}. 


For each problem, write out ò in complete detail, then check your answers by tracing several test 
examples. 


8. Design a Turing machine that accepts the language 
L= {ww: w efa,b}7}. 
9. Construct a Turing machine to compute the function 
fw ws, 


where w e {0,1}. 


10. Design a Turing machine that finds the middle of a string of even length. Specifically, if w = 
Ay y...AyAy+1++.Ay,, With a; € È, the Turing machine should produce ® = 4142-.-Gn¢@n41-.-42n, where 


cel-È 


11. Design Turing machines to compute the following functions forx andy positive integers 
represented in unary. 


(a} f (2) = 32. 
(b) f(x,y) =z —y, £ >Y, 
— 0, r < y. 

(c) f (x,y) = 2x + 3y. 
(d) f(r) =, if x is even, 

— ztl if x is odd. 
(e) f (x) = 2 mod 5. 
(f£) f(x) = |=], where |=] denotes the largest integer less than or 


equal to =. 


12. Design a Turing machine with = {0,1,0} that, when started on any cell containing a blank or a 
1, will halt if and only if its tape has a 0 somewhere on it. 


13. Write out a complete solution for Example 9.8. 


14. Give the sequence of instantaneous descriptions that the Turing machine in Example 9.10 goes 
through when presented with the input 111. What happens when this machine is started with 110 
on its tape? 

15. Give convincing arguments that the Turing machine in Example 9.10 does in fact carry out the 
indicated computation. 


16. Complete all the details in Example 9.11. 


17. Suppose that in Example 9.9 we had decided to represent x andy in binary. Write a Turing 
machine program for doing the indicated computation in this representation. 


18. Sketch how Example 9.9 could be solved if x and y were represented in decimal. 


19. You may have noticed that all the examples in this section had only one final state. Is it generally 
true that for any Turing machine, there exists another one with only one final state that accepts the 
same language? 


20. Definition 9.2 excludes the empty string from any language accepted by a Turing machine. 
Modify the definition so that languages that contain 1 may be accepted. 


9.2 Combining Turing Machines for Complicated Tasks 


We have shown explicitly how some important operations found in all computers can be done on a 
Turing machine. Since, in digital computers, such primitive operations are the building blocks for 
more complex instructions, let us see howthese basic operations can also be put together on a Turing 
machine. To demonstrate how Turing machines can be combined, we follow a practice common in 


programming. We start with a high-level description, then refine it successively until the program is 
in the actual language with which we are working. We can describe Turing machines several ways at 
a high level; block diagrams or pseudocode are the two approaches we will use most frequently in 
subsequent discussions. In a block diagram, we encapsule computations in boxes whose function is 
described, but whose interior details are not shown. By using such boxes, we implicitly claim that 
they can actually be constructed. As a first example, we combine the machines in Examples 9.9 and 
9.11. 


Example 9.12 


Design a Turing machine that computes the function 


S&y)=5xty  ifx2y, 
=0if x<y. 


For the sake of discussion, assume thatx and y are positive integers in unary representation. The 
value zero will be represented by 0, with the rest of the tape blank. 


The computation off (x, y) can be visualized at a high level by means of the diagram in Figure 
9.8. The diagram shows that we first use a comparing machine, like that in Example 9.11, to 
determine whether or notx >y. If so, the comparer sends a start signal to the adder, which then 
computes x + y. If not, an erasing program is started that changes every | to a blank. 


In subsequent discussions, we will often use such high-level, black-diagram representations of 
Turing machines. It is certainly quicker and clearer than the corresponding extensive set of 6’s. 
Before we accept this high-level view, we must justify it. What, for example, is meant by saying that 
the comparer sends a start signal to the adder? There is nothing in Definition 9.1 that offers that 
possibility. Nevertheless, it can be done in a straightforward way. 


Figure 9.8 


>f (x, y) 


The program for the comparer C is written as suggested in Example 9.11, using a Turing machine 
having states indexed with C. For the adder, we use the idea in Example 9.9, with states indexed with 
A. For the eraser E£, we construct a Turing machine having states indexed with £. The computations to 
be done by C are 


Ba 
gc ow (x) Ow (y) F qa ow (x) Ow (y) lst 2y, 


and 


* 
qcow (x) Ow (y) F qg ow (x) Ow (y) if £ <y. 


If we take q4 o and qg o as the initial states of A and E, respectively, we see that C starts either A or E. 


The computations performed by the adder will be 


qa ow (x) 0w (y) H qa fw (x +y)0, 
and that of the eraser E£ will be 
qg ow (x) Ow (y) F gp, 0. 


The result is a single Turing machine that combines the action of C,A, and E as indicated in Figure 
9.8. 


Another useful, high-level viewof Turing machines involves pseudocode. In computer 
programming, pseudocode is a way of outlining a computation using descriptive phrases whose 
meaning we claim to understand. While this description is not usable on the computer, we assume that 
we can translate it into the appropriate language when needed. One simple kind of pseudocode is 
exemplified by the idea of a macroinstruction, which is a single-statement shorthand for a sequence of 
lower-level statements. We first define the macroinstruction in terms of the lower-level language. We 
then use the macroinstruction in a program with the assumption that the relevant low-level code is 
substituted for each occurrence of the macroinstruction. This idea is very useful in Turing machine 
programming. 


Example 9.13 


Consider the macroinstruction 
if a then q; else qx, 


with the following interpretation. If the Turing machine reads an a, then regardless of its current state, 
it is to go into state q; without changing the tape content or moving the read-write head. If the symbol 


read is not an a, the machine is to go into state q, without changing anything. 


To implement this macroinstruction requires several relatively obvious steps of a Turing machine. 


ôl qi; â) = (Qjo; 4, R) forall GEQ, 


6(q;,6) = (qo: b, R) for all q; € Q and all b ET — {a}, 
ô (djo, e)={ QG; L) for allc eT, 
Ô (dko, €) = (Qk, ¢, L) for alle er. 


The states qjq and qo are newstates, introduced to take care of complications arising from the fact 
that in a standard Turing machine the read-write head changes position in each move. In the 


macroinstruction, we want to change the state, but leave the read-write head where it is. We let the 
head move right, but put the machine into a state q; or qọ. This indicates that a left move must be 


made before entering the desired state q; or qx. 


Going a step further, we can replace macroinstructions with subprograms. Normally, a 
macroinstruction is replaced by actual code at each occurrence, whereas a subprogram is a single 
piece of code that is invoked repeatedly whenever needed. Subprograms are fundamental to high- 
level programming languages, but they can also be used with Turing machines. To make this 
plausible, let us outline briefly howa Turing machine can be used as a subprogram that can be 
invoked repeatedly by another Turing machine. This requires a newfeature: the ability to store 
information on the calling program's configuration so the configuration can be recreated on return 
from the subprogram. For example, say machine A in state q; invokes machine B. When B is finished, 


we would like to resume program A in state q;, with the read-write head (which may have moved 
during B's operation) in its original place. At other times, A may call B from state g,, in which case 


control should return to this state. To solve the control transfer problem, we must be able to pass 
information from A to B and vice versa, be able to recreate A’s configuration when it recovers control 
from B, and assure that the temporarily suspended computations of A are not affected by the execution 
of B. To solve this, we can divide the tape into several regions as shown in Figure 9.9. 


Figure 9.9 


Region separator 


Wi rkspace for A T Workspace for B 


Before A calls B, it writes the information needed by B (e.g., A’s current state, the arguments for 
B) on the tape in some region T. A then passes control to B by making a transition to the start state of 
B. After transfer, B will use T to find its input. The workspace for B is separate from T and from the 
workspace for A, so no interference can occur. When B is finished, it will return relevant results to 
region T, where A will expect to find it. In this way, the two programs can interact in the required 
fashion. Note that this is very similar to what actually happens in a real computer when a subprogram 
is called. 


We can nowprogram Turing machines in pseudocode, provided that we know(in theory at least) 
howto translate this pseudocode into an actual Turing machine program. 


Example 9.14 
Design a Turing machine that multiplies two positive integers in unary notation. 


A multiplication machine can be constructed by combining the ideas we encountered in adding 
and copying. Let us assume that the initial and final tape contents are to be as indicated in Figure 9.10. 


The process of multiplication can then be visualized as a repeated copying of the multiplicand y for 
each 1 in the multiplier x, whereby the stringy is added the appropriate number of times to the 
partially computed product. The following pseudocode shows the main steps of the process. 


1. Repeat the following steps until x contains no more 1’s. Find a 1 in x and replace it with another 
symbol a. Replace the leftmost 0 by Oy. 


2. Replace all a’s with 1’s. 


Although this pseudocode is sketchy, the idea is simple enough that there should be no doubt that 
it can be done. 


Figure 9.10 


0 ofa} afa}o/a |i] 0/1 


y * xy 7 x 


In spite of the descriptive nature of these examples, it is not too farfetched to conjecture that 
Turing machines, while rather primitive in principle, can be combined in many ways to make them 
quite powerful. Our examples were not general and detailed enough for us to claim that we have 


proved anything, but it should be plausible at this point that Turing machines can do some quite 
complicated things. 


EXERCISES 


1. Write out the complete solution to Example 9.14. 


2. Establish a convention for representing positive and negative integers in unary notation. With your 
convention, sketch the construction of a subtracter for computing x - y. 


3. Using adders, subtracters, comparers, copiers, or multipliers, draw block diagrams for Turing 
machines that compute the functions 


(a) f (n)= n(nt 1), 
(b) f(n)= n’, 
(c) f (n)= 2” 
(df (m= al, 
(e) f (n= n", 
for all positive integers n. 


4. Use a block diagram to sketch the implementation of a function f defined for all wj, w>, w3 € {1}* 
by 


f (wl w?,w?)= i, 
where 7 is such that |w,J— max(|w|,{w>|,|w3|)1f no two w’s have the same length, and i = 0 otherwise. 


5. Provide a ‘high-level’ description for Turing machines that accept the following languages on { a, 
b}. For each problem, define a set of appropriate macroinstructions that you feel are reasonably 
easy to implement. Then use them for the solution. 


(aL = {ww}. 
(b)L = {wıwz:w;Fw: w1] = [Wo]. 
(c)The complement of the language in part (a). 
(d)L = {a" b” :m =n? >21}. 
(e)L = {a” : n is a prime number}. 
6. Suggest a method for representing rational numbers on a Turing machine, then sketch a method for 
adding and subtracting such numbers. 


7. Sketch the construction of a Turing machine that can perform the addition and multiplication of 
positive integers x and y given in the usual decimal notation. 


8. Give an implementation of the macroinstruction 
searchright (a, q» q;), 


which indicates that the machine is to search its tape to the right of the current position for the first 
occurrence of the symbol a. If ana is encountered before a blank, the machine is to go into state 
qi, otherwise it is to go into state q;. 


9. Use the macroinstruction in the previous exercise to design a Turing machine on È = {a,b}that 
accepts the language L (ab*ab*a). 


10. Use the macroinstruction searchright in Exercise 8 to create a Turing machine program that 
replaces the symbol immediately to the left of the leftmost a by a blank. If the input contains no a, 
replace the rightmost nonblank symbol by a b. 


93 Turing's Thesis 


The preceding discussion not only shows how a Turing machine can be constructed from simpler 
parts, but also illustrates a negative aspect of working with such low-level automata. While it takes 
very little imagination or ingenuity to translate a block diagram or pseudocode into the corresponding 
Turing machine program, actually doing it is time-consuming, error-prone, and adds little to our 
understanding. The instruction set of a Turing machine is so restricted that any argument, solution, or 
proof for a nontrivial problem is quite tedious. 


We nowface a dilemma: We want to claim that Turing machines can perform not only the simple 
operations for which we have provided explicit programs, but also more complex processes as well, 


describable by block diagrams or pseudocode. To defend such claims against challenge, we should 
showthe relevant programs explicitly. But doing so is unpleasant and distracting, and ought to be 
avoided if possible. Somehow, we would like to find a way of carrying out a reasonably rigorous 
discussion of Turing machines without having to write lengthy, low-level code. There 1s unfortunately 
no completely satisfactory way of getting out of the predicament; the best we can do is to reach a 
reasonable compromise. To see how we might achieve such a compromise, we turn to a somewhat 
philosophical issue. 


We can drawsome simple conclusions from the examples in the previous section. The first is that 
Turing machines appear to be more powerful than pushdown automata (for a comment on this, see 
Exercise 2 at the end of this section). In Example 9.8, we sketched the construction of a Turing 
machine for a language that is not context-free and for which, consequently, no pushdown automaton 
exists. Examples 9.9, 9.10, and 9.11 show that Turing machines can do some simple arithmetic 
operations, perform string manipulations, and make some simple comparisons. The discussion also 
illustrates how primitive operations can be combined to solve more complex problems, how several 
Turing machines can be composed, and how one program can act as a subprogram for another. Since 
very complex operations can be built this way, we might suspect that a Turing machine begins to 
approach a typical computer in power. 


Suppose we were to make the conjecture that, in some sense, Turing machines are equal in power 
to a typical digital computer? How could we defend or refute such a hypothesis? To defend it, we 
could take a sequence of increasingly more difficult problems and show how they are solved by some 
Turing machine. We might also take the machine language instruction set of a specific computer and 
design a Turing machine that can perform all the instructions in the set. This would undoubtedly tax 
our patience, but it ought to be possible in principle if our hypothesis is correct. Still, while every 
success in this direction would strengthen our conviction of the truth of the hypothesis, it would not 
lead to a proof. The difficulty lies in the fact that we don't know exactly what is meant by “a typical 
digital computer” and that we have no means for making a precise definition. 


We can also approach the problem from the other side. We might try to find some procedure for 
which we can write a computer program, but for which we can show that no Turing machine can 
exist. If this were possible, we would have a basis for rejecting the hypothesis. But no one has yet 
been able to produce a counterexample; the fact that all such tries have been unsuccessful must be 
taken as circumstantial evidence that it cannot be done. Every indication is that Turing machines are 
in principle as powerful as any computer. 


Arguments of this type led A. M. Turing and others in the mid-1930s to the celebrated conjecture 
called the Turing thesis. This hypothesis states that any computation that can be carried out by 
mechanical means can be performed by some Turing machine. 


This is a sweeping statement, so it is important to keep in mind what Turing's thesis is. It is not 
something that can be proved. To do so, we would have to define precisely the term “mechanical 
means.” This would require some other abstract model and leave us no further ahead than before. The 
Turing thesis is more properly viewed as a definition of what constitutes a mechanical computation: 
A computation is mechanical if and only if it can be performed by some Turing machine. 


If we take this attitude and regard the Turing thesis simply as a definition, we raise the question as 
to whether this definition is sufficiently broad. Is it far-reaching enough to cover everything we now 
do (and conceivably might do in the future) with computers? An unequivocal “yes” is not possible, 
but the evidence in its favor is very strong. Some arguments for accepting the Turing thesis as the 


definition of a mechanical computation are 
1. Anything that can be done on any existing digital computer can also be done by a Turing machine. 


2. No one has yet been able to suggest a problem, solvable by what we intuitively consider an 
algorithm, for which a Turing machine program cannot be written. 


3. Alternative models have been proposed for mechanical computation, but none of them is more 
powerful than the Turing machine model. 


These arguments are circumstantial, and Turing's thesis cannot be proved by them. In spite of its 
plausibility, Turing's thesis is still an assumption. But viewing Turing's thesis simply as an arbitrary 
definition misses an important point. In some sense, Turing's thesis plays the same role in computer 
science as do the basis laws of physics and chemistry. Classical physics, for example, is based 
largely on Newton's laws of motion. Although we call them laws, they do not have logical necessity; 
rather, they are plausible models that explain much of the physical world. We accept them because 
the conclusions we draw from them agree with our experience and our observations. Such laws 
cannot be proved to be true, although they can possibly be invalidated. If an experimental result 
contradicts a conclusion based on the laws, we might begin to question their validity. On the other 
hand, repeated failure to invalidate a lawstrengthens our confidence in it. This is the situation for 
Turing's thesis, so we have some reason for considering it a basic lawof computer science. The 
conclusions we draw from it agree with what we know about real computers, and so far, all attempts 
to invalidate it have failed. There is always the possibility that someone will come up with another 
definition that will account for some subtle situations not covered by Turing machines but which still 
fall within the range of our intuitive notion of mechanical computation. In such an eventuality, some of 
our subsequent discussions would have to be modified significantly. However, the likelihood of this 
happening seems to be very small. 


Having accepted Turing's thesis, we are in a position to give a precise definition of an algorithm. 
Definition 9.5 


An algorithm for a function f : D—R is a Turing machine M, which given as input any d € D on 
its tape, eventually halts with the correct answer f (d) € R on its tape. Specifically, we can require 
that 


god Faf qs f (d) igr E F, 


for all d € D. 


Identifying an algorithm with a Turing machine program allows us to prove rigorously such 
claims as “there exists an algorithm...” or “there is no algorithm....” However, to construct explicitly 
an algorithm for even relatively simple problems is a very lengthy undertaking. To avoid such 
unpleasant prospects, we can appeal to Turing's thesis and claim that anything we can do on any 
computer can also be done on a Turing machine. Consequently, we could substitute “C program” for 
“Turing machine” in Definition 9.5. This would ease the burden of exhibiting algorithms 


considerably. Actually, as we have already done, we will go one step further and accept verbal 
descriptions or block diagrams as algorithms on the assumption that we could write a Turing machine 
program for them if we were challenged to do so. This greatly simplifies the discussion, but it 
obviously leaves us open to criticism. While “C program” is well defined, “clear verbal description” 
is not, and we are in danger of claiming the existence of nonexistent algorithms. But this danger is 
more than offset by the facts that we can keep the discussion simple and intuitively clear and that we 
can give concise descriptions for some rather complex processes. The reader who has any doubts 
about the validity of these claims can dispel them by writing a suitable program in some programming 
language. 


EXERCISES 


“* 1, Consider the set of machine language instructions for a computer of your choice. Sketch how the 
various instructions in this set could be carried out by a Turing machine. 


2. In the above discussion, we stated at one point that Turing machines appear to be more powerful 
than pushdown automata. Since the tape of a Turing machine can always be made to behave like a 
stack, it would seem that we can actually claim that a Turing machine is more powerful. What 
important factor is not taken into account in this argument? 


““3. There are a number of enjoyable articles on Turing machines in the popular literature. A good 
one is a paper in Scientific American, May 1984, by J. E. Hopcroft, titled “Turing Machines”. 
This paper talks about the ideas we have introduced here and also gives some of the historical 
context in which the work of Turing and others was done. Get a copy of this article and read it, 
then write a brief review of it. 


Chapter 10 


Other Models of 
Turing Machines 


ur definition of a standard Turing machine is not the only possible one; there are 
alternative definitions that could serve equally well. The conclusions we can draw about 
the power of a Turing machine are largely independent of the specific structure chosen for 
it. In this chapter we look at several variations, showing that the standard Turing machine 
is equivalent, in a sense we will define, to other, more complicated models. 


If we accept Turing's thesis, we expect that complicating the standard Turing machine by giving it 
a more complex storage device will not have any effect on the power of the automaton. Any 
computation that can be performed on such a new arrangement will still fall under the category of a 
mechanical computation and, therefore, can be done by a standard model. It is nevertheless instructive 
to study more complex models, if for no other reason than that an explicit demonstration of the 
expected result will demonstrate the power of the Turing machine and thereby increase our 
confidence in Turing's thesis. Many variations on the basic model of Definition 9.1 are possible. For 
example, we can consider Turing machines with more than one tape or with tapes that extend in 
several dimensions. We will consider variants that will be useful in subsequent discussions. 


We also look at nondeterministic Turing machines and show that they are no more powerful than 
deterministic ones. This is unexpected, since Turing's thesis covers only mechanical computations 
and does not address the clever guessing implicit in nondeterminism. Another issue that is not 
immediately resolved by Turing's thesis is that of one machine executing different programs at 
different times. This leads to the idea of a “reprogrammable” or “universal” Turing machine. 


Finally, in preparation for later chapters, we look at linear bounded automata. These are Turing 
machines that have an infinite tape, but that can make use of the tape only in a restricted way. 


10.1 Minor Variations on the Turing Machine Theme 


We first consider some relatively minor changes in Definition 9.1 and investigate whether these 
changes make any difference in the general concept. Whenever we change a definition, we introduce a 
new type of automata and raise the question whether these new automata are in any real sense 
different from those we have already encountered. What do we mean by an essential difference 
between one class of automata and another? Although there may be clear differences in their 
definitions, these differences may not have any interesting consequences. We have seen an example of 
this in the case of deterministic and nondeterministic finite automata. These have quite different 
definitions, but they are equivalent in the sense that they both are identified exactly with the family of 
regular languages. Extrapolating from this, we can define equivalence or nonequivalence for classes 
of automata in general. 


Equivalence of Classes of Automata 


Whenever we define equivalence for two automata or classes of automata, we must carefully state 
what is to be understood by this equivalence. For the rest of this chapter, we follow the precedence 
established for nfa's and dfa's and define equivalence with respect to the ability to accept languages. 


Definition 10.1 


Two automata are equivalent if they accept the same language. Consider two classes of automata C} 
and C}. If for every automaton M, in C} there is an automaton M, in C, such that 


L (M,)=L (M), 


we say that C, is at least as powerful as C1. If the converse also holds and for every M, in C} there is 
an M; in C} such that L (M1) = L (M3), we say that C; and C, are equivalent. 


There are many ways to establish the equivalence of automata. The construction of Theorem 2.2 
does this for dfa's and nfa's. For demonstrating the equivalence in connection with Turing's machines, 
we often use the important technique of simulation. 


Let M be an automaton. We say that another automaton M. can simulate a computation of M if Mes 
can mimic the computation of M in the following manner. Let do,d),...be the sequence of 


instantaneous descriptions of the computation of M, that is, 


do Fmdi FM sjera Fmdn EE. 


Then Be simulates this computation if it carries out a 


Sats! ae + rR 


do a, dy aT oes sy, d- EER 


where do, d 1...are instantaneous descriptions, such that each of them is associated with a unique 


configuration of M. In other words, if we know the computation carried out by ail we can determine 
from it exactly what computations M would have done, given the corresponding starting 
configuration. 


Note that the simulation of a single move di Fy di+1 of M may involve several moves of M. The 


intermediate configurations indi Fa Cita may not correspond to any configuration of M, but this does 
not affect anything if we can tell which configurations of M. are relevant. As long as we can 


determine from the computation of M. what M would have done, the simulation is proper. If M. can 


simulate every computation of M, we say that M. can simulate M. It should be clear that if “/* can 


simulate M, then matters can be arranged so that M and ai accept the same language, and the two 
automata are equivalent. To demonstrate the equivalence of two classes of automata, we show that for 
every machine in one class, there is a machine in the second class capable of simulating it, and vice 
versa. 


Turing Machines with a Stay-Option 


In our definition of a standard Turing machine, the read-write head must move either to the right or to 
the left. Sometimes it is convenient to provide a third option, to have the read-write head stay in place 
after rewriting the cell content. Thus, we can define a Turing machine with a stay-option by replacing 
6 in Definition 9.1 by with the interpretation that S signifies no movement of the read-write head. This 
option does not extend the power of the automaton. 


5: QxT—>QxT x{L, R, S} 


Theorem 10.1 


The class of Turing machines with a stay-option is equivalent to the class of standard Turing 
machines. 


Proof: Since a Turing machine with a stay-option is clearly an extension of the standard model, it is 
obvious that any standard Turing machine can be simulated by one with a stay-option. 


To show the converse, let M = (Q, ÈŁ,I,ô, qọO,F) be a Turing machine with a stay-option to be 


M = (@,5,T,6,9,0,F 
simulated by a standard Turing machine ( ‘ = ) For each move ofM, the 


simulating machine M. does the following. If the move of M does not involve the stay-option, the 
simulating machine performs one move, essentially identical to the move to be simulated. If S' is 


involved in the move of M, then M. will make two moves: The first rewrites the symbol and moves 
the read-write head right; the second moves the read-write head left, leaving the tape contents 
unaltered. The simulating machine can be constructed from M by defining 6, as follows: For each 
transition 


(9,4) = (q;, b, L or R), 
we put into 6 
ô (G;,a) = (G,b, L or R) 


For each S-transition 


8 (ds a) = (q; b, S), 
we put into 6 the corresponding transitions 

5 ( Gi.) = (Gj5,5, R), 
and 

ô (Tis c) = (Ñc, L) 


for allceT. 


It is reasonably obvious that every computation of M has a corresponding computation of aa SO 
that “/* can simulate M. m 
p] 


Simulation is a standard technique for showing the equivalence of automata, and the formalism we 
have described makes it possible, as shown in the above theorem, to talk about the process precisely 
and prove theorems about equivalence. In our subsequent discussion, we use the notion of simulation 
frequently, but we generally make no attempt to describe everything in a rigorous and detailed way. 
Complete simulations with Turing machines are often cumbersome. To avoid this, we keep our 
discussion descriptive, rather than in theorem-proof form. The simulations are given only in broad 
outline, but it should not be hard to see how they can be made rigorous. The reader will find it 
instructive to sketch each simulation in some higher-level language or in pseudocode. 


Figure 10.1 


a Track 1 
t Track 2 
Track 3 


Before introducing other models, we make one remark on the standard Turing machine. It is 
implicit in Definition 9.1 that each tape symbol can be a composite of characters rather than just a 
single one. This can be made more explicit by drawing an expanded version of Figure 9.1 (Figure 
10.1), in which the tape symbols are triplets from some simpler alphabet. 


In the picture, we have divided each cell of the tape into three parts, called tracks, each 
containing one member of the triplet. Based on this visualization, such an automaton is sometimes 
called a Turing machine with multiple tracks, but such a view in no way extends Definition 9.1, 
since all we need to do is make I’ an alphabet in which each symbol is composed of several parts. 


However, other Turing machine models involve a change of definition, so the equivalence with 
the standard machine has to be demonstrated. Here we look at two such models, which are sometimes 
used as the standard definition. Some variants that are less common are explored in the exercises at 


the end of this section. 


Turing Machines with Semi-Infinite Tape 


Many authors do not consider the model in Figure 9.1 as standard, but use one with a tape that is 
unbounded only in one direction. We can visualize this as a tape that has a left boundary (Figure 
10.2). This Turing machine is otherwise identical to our standard model, except that no left move is 
permitted when the read-write head is at the boundary. 


Figure 10.2 
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It is not difficult to see that this restriction does not affect the power of the machine. To simulate a 


Figure 10.3 


standard Turing machine M by a machine M. with a semi-infinite tape, we use the arrangement shown 
in Figure 10.3. 


The simulating machine M has a tape with two tracks. On the upper one, we keep the information 
to the right of some reference point on M’s tape. The reference point could be, for example, the 
position of the read-write head at the start of the computation. The lower track contains the left part of 


M’s tape in reverse order. ME is programmed so that it will use information on the upper track only as 
long as M’s read-write head is to the right of the reference point, and work on the lower track as M 


moves into the left part of its tape. The distinction can be made by partitioning the state set of M. into 
two parts, say Qu and QL: the first to be used when working on the upper track, the second to be used 
on the lower one. Special end markers # are put on the left boundary of the tape to facilitate switching 
from one track to the other. For example, assume that the machine to be simulated and the simulating 
machine are in the respective configurations shown in Figure 10.4 and that the move to be simulated 
is generated by 


6 (q; a) - (q;, C, L). 


The simulating machine will first move via the transition 


5 (G, (a, b)) = (G. (c,b) , L) 


where “i © QU e Q,,. Because Gi belongs to Q,,, only information in the upper track is considered at 


this point. Now, the simulating machine sees (#, #) in state Jj E QUe Qu. It next uses a transition 


ô (G.(#,#)) = (Bj, (#, #). R) 
Figure 10.4 
Reference point 
| q; j q; 
t f 
b a 
(a) (b) 
(a) Machine to be simulated. 
(b) Simulating machine. 
Figure 10.5 
A A A 
q g; p 


Sequence of configurations in simulating 6 (q;, a)=(q;, c, L). 


with P j E Qz, putting it into the configuration shown in Figure 10.5. Now the machine is in a state 
from Q; and will work on the lower track. Further details of the simulation are straightforward. 


The Off-Line Turing Machine 


The general definition of an automaton in Chapter | contained an input file as well as temporary 
storage. In Definition 9.1 we discarded the input file for reasons of simplicity, claiming that this made 
no difference to the Turing machine concept. We now expand on this claim. 


If we put the input file back into the picture, we get what is known as an off-line Turing machine . 
In such a machine, each move is governed by the internal state, what is currently read from the input 
file, and what is seen by the read-write head. A schematic representation of an off-line machine is 
shown in Figure 10.6. A formal definition of an off-line Turing machine is easily made, but we will 
leave this as an exercise. What we want to do briefly is to indicate why the class of off-line Turing 
machines is equivalent to the class of standard machines. 


First, the behavior of any standard Turing machine can be simulated by some off-line model. All 
that needs to be done by the simulating machine is to copy the input from the input file to the tape. 
Then it can proceed in the same way as the standard machine. 


Figure 10.6 


Read-only input file 


Control unit 


Figure 10.7 


Control unit 
of M 


The simulation of an off-line machine M by a standard machine sai requires a lengthier 
description. A standard machine can simulate the computation of an off-line machine by using the 
four-track arrangement shown in Figure 10.7. In that picture, the tape contents shown represent the 


specific configuration of Figure 10.6. Each of the four tracks of M: plays a specific role in the 


simulation. The first track has the input, the second marks the position at which the input is read, the 
third represents the tape of M, and the fourth shows the position of M’s read-write head. 


The simulation of each move of M requires a number of moves of aa Starting from some standard 


position, say the left end, and with the relevant information marked by special end markers, al 
searches track 2 to locate the position at which the input file of M is read. The symbol found in the 


corresponding cell on track 1 is remembered by putting the control unit of M. into a state chosen for 
this purpose. Next, track 4 is searched for the position of the read-write head of M. With the 
remembered input and the symbol on track 3, we now know that M is to do. This information is again 


remembered by M. with an appropriate internal state. Next, all four tracks of Ms tape are modified 


to reflect the move of M. Finally, the read-write head of M. returns to the standard position for the 
simulation of the next move. 


EXERCISES 


1. Give a formal definition of a Turing machine with a semi-infinite tape. Then prove that the class of 
Turing machines with semi-infinite tape is equivalent to the class of standard Turing machines. 


2. Give a formal definition of an off-line Turing machine. 


3. Give convincing arguments that any language accepted by an off-line Turing machine is also 
accepted by some standard machine. 


4. Consider a Turing machine that, on any particular move, can either change the tape symbol or 
move the read-write head, but not both. 


(a) Give a formal definition of such a machine. 
(b) Show that the class of such machines is equivalent to the class of standard Turing machines. 


5. Consider a model of a Turing machine in which each move permits the read-write head to travel 
more than one cell to the left or right, the distance and direction of travel being one of the 
arguments of 6. Give a precise definition of such an automaton and sketch a simulation of it by a 
standard Turing machine. 


6. A nonerasing Turing machine is one that cannot change a nonblank symbol to a blank. This can be 
achieved by the restriction that if 


ò (q; a) = (q;, 0, Lor R), 
then a must be O. Show that no generality is lost by making such a restriction. 


7. Consider a Turing machine that cannot write blanks; that is, for all 6 (q;, a) = (q;, b, L or R), b 
must be in I’- {0}. Show how such a machine can simulate a standard Turing machine. 


8. Suppose we make the requirement that a Turing machine can halt only in a final state, that is, we 
ask that ò (q, a) be defined for all pairs (q, a) witha e T and q ¢ F. Does this restrict the power 
of the Turing machine? 


9. Suppose we make the restriction that a Turing machine must always write a symbol different from 
the one it reads, that is, if 


ò (qi a) E (q;,b, L or R), 
then a and b must be different. Does this limitation reduce the power of the automaton? 


10. Consider a version of the standard Turing machine in which transitions can depend not only on 
the cell directly under the read-write head, but also on the cells to the immediate right and left. 
Make a formal definition of such a machine, then sketch its simulation by a standard Turing 
machine. 


11. Consider a Turing machine with a different decision process in which transitions are made if the 
current tape symbol is not one ofa specified set. For example, 


ò (qi, (4, DS) = (qj, c, R) 


will allow the indicated move if the current tape symbol is neither a nor b. Formalize this 
concept and show that this modification is equivalent to a standard Turing machine. 


10.2 Turing Machines with More Complex Storage 


The storage device of a standard Turing machine is so simple that one might think it possible to gain 
power by using more complicated storage devices. But this is not the case, as we now illustrate with 
two examples. 


Multitape Turing Machines 


A multitape Turing machine is a Turing machine with several tapes, each with its own independently 
controlled read-write head (Figure 10.8). 


The formal definition of a multitape Turing machine goes beyond Definition 9.1, since it requires 
a modified transition function. Typically, we define an n-tape machine by M = (Q, È, T, ò, qo, F), 


where Q,%, Y,q,,/ are as in Definition 9.1, but where 
ò: Q xI” —ỌQxTI”x {L, RY” 


specifies what happens on all the tapes. For example, if n = 2, with a current configuration shown in 
Figure 10.8, then 


Ù (qo; a, e) = (q1, X, y, L, R) 


is interpreted as follows. The transition rule can be applied only if the machine is in state qọ and the 


first read-write head sees an a and the second an e. The symbol on the first tape will then be replaced 
with anx and its read-write head will move to the left. At the same time, the symbol on the second 
tape is rewritten as y and the read-write head moves right. The control unit then changes its state to 
qand the machine goes into the new configuration shown in Figure 10.9. 


To show the equivalence between multitape and standard Turing machines, we argue that any 


given multitape Turing machine M can be simulated by a standard Turing machine a and, 
conversely, that any standard Turing machine can be simulated by a multitape one. The second part of 
this claim needs no elaboration, since we can always elect to run a multitape machine with only one 
of its tapes doing useful work. The simulation of a multitape machine by one with a single tape is a 
little more complicated, but conceptually straightforward. 


Figure 10.8 
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Consider, for example, the two-tape machine in the configuration depicted in Figure 10.10. The 
simulating single-tape machine will have four tracks (Figure 10.11). The first track represents the 
contents of tape 1 of M. The nonblank part of the second track has all zeros, except for a single 1 
marking the position of M’s read-write head. Tracks 3 and 4 play a similar role for tape 2 of M. 


Figure 10.11 makes it clear that, for the relevant configurations of w; (that is, the ones that have the 


indicated form), there is a unique corresponding configuration of M. 

The representation of a multitape machine by a single-tape machine is similar to that used in the 
simulation of an off-line machine. The actual steps in the simulation are also much the same, the only 
difference being that there are more tapes to consider. The outline given for the simulation of offline 
machines carries over to this case with minor modifications and suggests a procedure by which the 


transition function of M. can be constructed from the transition function 6 and M. While it is not 


difficult to make the construction precise, it takes a lot of writing. Certainly, the computations of ag, 
given the appearance of being lengthy and elaborate, but this has no bearing on the conclusion. 


Whatever can be done on M can also be done on aiid 
Figure 10.10 
q 
} o 
Figure 10.11 


It is important to keep in mind the following point. When we claim that a Turing machine with 
multiple tapes is no more powerful than a standard one, we are making a statement only about what 
can be done by these machines, particularly, what languages can be accepted. 


Example 10.1 


Consider the language {a”b”}. In Example 9.7, we described a laborious method by which this 
language can be accepted by a Turing machine with one tape. Using a two-tape machine makes the job 
much easier. Assume that an initial string a”b” is written on tape 1 at the beginning of the 
computation. We then read all the a’s, copying them onto tape 2. When we reach the end of the a’s, 
we match the b’s on tape 1 against the copied a’s on tape 2. This way, we can determine whether 


there are an equal number of a’s and b’s without repeated back-and-forth movement of the read-write 
head. 


Remember that the various models of Turing machines are considered equivalent only with 
respect to their ability to do things, not with respect to ease of programming or any other efficiency 
measure we might consider. We will return to this important point in Chapter 14. 


Multidimensional Turing Machines 


A multidimensional Turing machine is one in which the tape can be viewed as extending infinitely in 
more than one dimension. A diagram of a two-dimensional Turing machine is shown in Figure 10.12. 


Figure 10.12 
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The formal definition of a two-dimensional Turing machine involves a transition function 6 of the 
form 


8: Q xT— Q xT x{L, R, U, D}, 


where U and D specify movement of the read-write head up and down, respectively. 


To simulate this machine on a standard Turing machine, we can use the two-track model depicted 
in Figure 10.13. First, we associate an ordering or address with the cells of the two-dimensional 
tape. This can be done in a number of ways, for example, in the two-dimensional fashion indicated in 
Figure 10.12. The two-track tape of the simulating machine will use one track to store cell contents 
and the other one to keep the associated address. In the scheme of Figure 10.12, the configuration in 
which cell (1, 2) contains a and cell (10, — 3) contains b is shown in Figure 10.13. Note one 
complication: The cell address can involve arbitrarily large integers, so the address track cannot use 


a fixed-size field to store addresses. Instead, we must use a variable field-size arrangement, using 
some special symbols to delimit the fields, as shown in the picture. 


Let us assume that, at the start of the simulation of each move, the read-write head of the two- 
dimensional machine M and the read-write head of the simulating machine M. are always on 


corresponding cells. To simulate a move, the simulating machine M. first computes the address of the 
cell to which M is to move. Using the two-dimensional address scheme, this is a simple computation. 


Once the address is computed, M. finds the cell with this address on track 2 and then changes the cell 


contents to account for the move of M. Again, given M, there is a straightforward construction for Me 


Figure 10.13 
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EXERCISES 


The purpose of much of our discussion of Turing machines is to lend credence to Turing's thesis by 
showing how seemingly more complex situations can be simulated on a standard Turing machine. 
Unfortunately, detailed simulations are very tedious and conceptually uninteresting. In the exercises 
below, describe the simulations in just enough depth to show that the details can be worked out. 


1. Define what one might call a multitape off-line Turing machine and describe how it can be 
simulated by a standard Turing machine. 


2. A multihead Turing machine can be visualized as a Turing machine with a single tape and a single 
control unit but with multiple, independent read-write heads. Give a formal definition of a 
multihead Turing machine, and then show how such a machine can be simulated with a standard 
Turing machine. 


3. Give a formal definition of a multihead-multitape Turing machine. Then show how such a machine 
can be simulated by a standard Turing machine. 


4. Give a formal definition of a Turing machine with a single tape but multiple control units, each 
with a single read-write head. Show how such a machine can be simulated with a multitape 
machine. 


“5. A queue automaton is an automaton in which the temporary storage is a queue. Assume that 
such a machine is an on-line machine, that is, it has no input file, with the string to be processed 
placed in the queue prior to the start of the computation. Give a formal definition of such an 
automaton, then investigate its power in relation to Turing machines. 


“6. Show that for every Turing machine there exists an equivalent standard Turing machine with no 
more than six states. 


“7. Reduce the number of required states in Exercise 6 as far as you can. (Hint:The smallest 
possible number is three.) 


“8. A counter is a stack with an alphabet of exactly two symbols, a stack start symbol and a counter 
symbol. Only the counter symbol can be put on the stack or removed from it. A counter 
automaton is a deterministic automaton with one or more counters as storage. Show that any 
Turing machine can be simulated using a counter automaton with four counters. 


9.Show that every computation that can be done by a standard Turing machine can be done by a 
multitape machine with a stay-option and at most two states. 


10. Write out a detailed program for the computation in Example 10.1. 


10.3 Nondeterministic Turing Machines 


While Turing's thesis makes it plausible that the specific tape structure is immaterial to the power of 
the Turing machine, the same cannot be said of nondeterminism. Since nondeterminism involves an 
element of choice and so has a nonmechanistic flavor, an appeal to Turing's thesis is inappropriate. 
We must look at the effect of nondeterminism in more detail if we want to argue that nondeterminism 
adds nothing to the power of a Turing machine. Again we resort to simulation, showing that 
nondeterministic behavior can be handled deterministically. 


Definition 10.2 


A nondeterministic Turing machine is an automaton as given by Definition 9.1, except that 6 is 
now a function 


ò: Q x T 3 20TXL R}, 


As always when nondeterminism is involved, the range of 6 is a set of possible transitions, any of 
which can be chosen by the machine. 


Example 10.2 


Ifa Turing machine has transitions specified by 
ò (40:4) = (91,5, R), (Gace, L)5, 
it is nondeterministic. The moves 
qoaaa + bqıaa 


and 


qoaaa F goLicaa 


are both possible. 


Since it is not clear what role nondeterminism plays in computing functions, nondeterministic 
automata are usually viewed as accepters. A nondeterministic Turing machine is said to accept w if 
there is any possible sequence of moves such that 


* 
gow F 21qF22, 


with gy € F. A nondeterministic machine may have moves available that lead to a nonfinal state or to 


an infinite loop. But, as always with nondeterminism, these alternatives are irrelevant; all we are 
interested in is the existence of some sequence of moves leading to acceptance. 


To show that a nondeterministic Turing machine is no more powerful than a deterministic one, we 
need to provide a deterministic equivalent for the nondeterminism. We have already alluded to one. 
Nondeterminism can be viewed as a deterministic backtracking algorithm, and a deterministic 
machine can simulate a nondeterministic one as long as it can handle the bookkeeping involved in the 
backtracking. To see how this can be done simply, let us consider an alternative view of 
nondeterminism, one which is useful in many arguments: A nondeterministic machine can be seen as 
one that has the ability to replicate itself whenever necessary. When more than one move is possible, 
the machine produces as many replicas as needed and gives each replica the task of carrying out one 
of the alternatives. This view of nondeterminism may seem particularly nonmechanistic, since 
unlimited replication is certainly not within the power of present-day computers. Nevertheless, a 
simulation is possible. 


One way to visualize the simulation is to use a standard Turing machine, keeping all possible 
instantaneous descriptions of the nondeterministic machine on its tape, separated by some convention. 
Figure 10.14 shows a way in which the two configurations aqọaa and bbq,a might appear. The 
symbols x are used to delimit the area of interest, while + separates individual instantaneous 
descriptions. The simulating machine looks at all active configurations and updates them according to 
the program of the nondeterministic machine. New configurations or expanding instantaneous 
descriptions will involve moving the x markers. The details are certainly tedious, but not hard to 
visualize. Based on this simulation, we conclude that for every nondeterministic Turing machine there 
exists an equivalent deterministic standard machine. 


Theorem 10.2 


The class of deterministic Turing machines and the class of nondeterministic Turing machines are 
equivalent. 


Proof: Use the construction suggested above to show that any nondeterministic Turing machine can be 
simulated by a deterministic one. m 


Later we will reconsider the effect of nondeterminism in practical situations, so we need to add 


some comments. As always, nondeterminism can be seen as a choice between alternatives. This can 
be visualized as a decision tree (Figure 10.15). 


O TEEN 


(}-------------—-— initial configuration 


Figure 10.15 


C) configurations after one move 


C) & C) & configurations after two moves 
à 2 

| halt halt 

| 

| 

| 

| 


———-—-—-+ } 


The width of such a configuration tree depends on the branching factor, that is, the number of 
options available on each move. If k denotes the maximum branching, then 


M =k" (10.1) 


is the maximum number of configurations that can exist after n moves. 


For later purposes, it is necessary to elaborate on the definition of language acceptance and also 
include the membership issue. 


Definition 10.3 


A nondeterministic Turing machine M is said to accept a language L if, for all w € L, at least one 
of the possible configurations accepts w. There may be branches that lead to nonaccepting 
configurations, while some may put the machine into an infinite loop. But these are irrelevant for 
acceptance. 


A nondeterministic Turing machine M is said to decide a language L if, for all w € X*, there is a 
path that leads either to acceptance or rejection. 


EXERCISES 


1. Discuss in detail the simulation of a nondeterministic Turing machine by a deterministic one. 
Indicate explicitly how new machines are created, how active machines are identified, and how 


machines that halt are removed from further consideration. 


2. Show how a two-dimensional nondeterministic Turing machine can be simulated by a 
deterministic machine. 


3. Write a program for a nondeterministic Turing machine that accepts the language 
L= {ww: wefa,b}"}. 
Contrast this with a deterministic solution. 


4. Outline how one would write a program for a nondeterministic Turing machine to accept the 
language 


L= {wwkw: w ef{a,b}* }. 
5. Write a simple program for a nondeterministic Turing machine that accepts the language 
L= { xww¥y: xy,w €fa, b}* x] > ly}. 
How would you solve this problem deterministically? 
6. Design a nondeterministic Turing machine that accepts the language 
L= {a" : nis nota prime number}. 


7. A two-stack automaton is a nondeterministic pushdown automaton with two independent stacks. 
To define such an automaton, we modify Definition 7.1 so that 


ò : Q x (2U{A})x T xT — finite subsets of Q x T* x T™*. 


A move depends on the tops of the two stacks and results in new values being pushed on these 
two stacks. Show that the class of two-stack automata is equivalent to the class of Turing 
machines. 


10.4 A Universal Turing Machine 


Consider the following argument against Turing's thesis: “A Turing machine as presented in 
Definition 9.1 is a special purpose computer. Once 6 is defined, the machine is restricted to carrying 
out one particular type of computation. Digital computers, on the other hand, are general-purpose 
machines that can be programmed to do different jobs at different times. Consequently, Turing 
machines cannot be considered equivalent to general-purpose digital computers.” 


This objection can be overcome by designing a reprogrammable Turing machine, called a 
universal Turing machine. A universal Turing machine M, is an automaton that, given as input the 


description of any Turing machine M and a string w, can simulate the computation of M on w. To 
construct such anM_,, we first choose a standard way of describing Turing machines. We may, 


without loss of generality, assume that 


Q = {91425 É days 


with q; the initial state, q the single final state, and 
T= {€},@,...dy};5 


where a, represents the blank. We then select an encoding in which q, is represented by 1, q) is 
represented by 11, and so on. Similarly, a, is encoded as 1, a, as 11, etc. The symbol 0 will be used 
as a separator between the 1’s. With the initial and final state and the blank defined by this 
convention, any Turing machine can be described completely with 6 only. The transition function is 
encoded according to this scheme, with the arguments and result in some prescribed sequence. For 
example, ô (q1, a2) = (q2, a3, L) might appear as 


...10110110111010.... 


It follows from this that any Turing machine has a finite encoding as a string on {0,1}* and that, given 
any encoding of M, we can decode it uniquely. Some strings will not represent any Turing machine 
(e.g., the string 00011), but we can easily spot these, so they are of no concern. 


A universal Turing machine M, then has an input alphabet that includes {0, 1} and the structure of 
a multitape machine, as shown in Figure 10.16. 


For any input M and w, tape 1 will keep an encoded definition of M. Tape 2 will contain the tape 
contents of M, and tape 3 the internal state of M. M, looks first at the contents of tapes 2 and 3 to 


determine the configuration of M. It then consults tape 1 to see what M would do in this configuration. 
Finally, tapes 2 and 3 will be modified to reflect the result of the move. 


It is within reason to construct an actual universal Turing machine (see, for example, Denning, 
Dennis, and Qualitz 1978), but the process is uninteresting. We prefer instead to appeal to Turing's 
hypothesis. The implementation clearly can be done using some programming language; in fact, the 
program suggested in Exercise 1, Section 9.1, is a realization of a universal Turing machine in a 
higher-level language. Therefore, we expect that it can also be done by a standard Turing machine. 
We are then justified in claiming the existence of a Turing machine that, given any program, can carry 
out the computations specified by that program and that is therefore a proper model for a general- 
purpose computer. 


Figure 10.16 
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The observation that every Turing machine can be represented by a string of 0’s and 1’s has 
important implications. But before we explore these implications, we need to review some results 
from set theory. 


Some sets are finite, but most of the interesting sets (and languages) are infinite. For infinite sets, 
we distinguish between sets that are countable and sets that are uncountable. A set is said to be 
countable if its elements can be put into a one-to-one correspondence with the positive integers. By 
this we mean that the elements of the set can be written in some order, say, x1, X2, X3,..., SO that every 


element of the set has some finite index. For example, the set of all even integers can be written in the 
order 0, 2, 4,.... Since any positive integer 2n occurs in positionn +1, the set is countable. This 
should not be too surprising, but there are more complicated examples, some of which may seem 
counterintuitive. Take the set of all quotients of the form p/q, where p and q are positive integers. 
How should we order this set to show that it is countable? We cannot use the sequence 


1 
Éi 


aes a 
bol a 
mI 


2 
because then 3 would never appear. This does not imply that the set is uncountable; in this case, there 
is a clever way of ordering the set to show that it is in fact countable. Look at the scheme depicted in 
Figure 10.17, and write down the element in the order encountered following the arrows. This gives 
us 


2 
Here the element 3 occurs in the seventh place, and every element has some position in the sequence. 
The set is therefore countable. 


Figure 10.17 
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We see from this example that we can prove that a set is countable if we can produce a method by 
which its elements can be written in some sequence. We call such a method an enumeration 
procedure. Since an enumeration procedure is some kind of mechanical process, we can use a Turing 
machine model to define it formally. 


Definition 10.4 


Let S be a set of strings on some alphabet £. Then an enumeration procedure for S is a Turing 
machine that can carry out the sequence of steps 


* * 
oL q.%1 # 811 G.%2 # S82 ..., 


with x; e [* — {#},s; € S, in such a way that any s in S is produced in a finite number of steps. The 
state q, is a state signifying membership in S; that is, whenever q, is entered, the string following # 
must be in S. 


Not every set is countable. As we will see in the next chapter, there are some uncountable sets. 
But any set for which an enumeration procedure exists is countable because the enumeration gives the 
required sequence. 


Strictly speaking, an enumeration procedure cannot be called an algorithm since it will not 
terminate when S is infinite. Nevertheless, it can be considered a meaningful process, because it 
produces well-defined and predictable results. 


Example 10.3 


Let È = {a, b, c}. We can show that the S =X* is countable if we can find an enumeration procedure 
that produces its elements in some order, say in the order in which they would appear in a dictionary. 
However, the order used in dictionaries is not suitable without modification. In a dictionary, all 
words beginning witha are listed before the string b. But when there are an infinite number ofa 
words, we will never reach b, thus violating the condition of Definition 10.4 that any given string be 
listed after a finite number of steps. 


Instead, we can use a modified order, in which we take the length of the string as the first 
criterion, followed by an alphabetic ordering of all equal-length strings. This is an enumeration 
procedure that gives the sequence 


a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, aaa, .... 


As we will have several uses for such an ordering, we will call it the proper order. 


An important consequence of the previous discussion is that Turing machines are countable. 


Theorem 10.3 


The set of all Turing machines, although infinite, 1s countable. 


Proof: We can encode each Turing machine using 0 and 1. With this encoding, we then construct the 
following enumeration procedure. 


1. Generate the next string in {0,1!* in proper order. 


2. Check the generated string to see if it defines a Turing machine. If so, write it on the tape in the 
form required by Definition 10.4. If not, ignore the string. 


3. Return to Step 1. 


Since every Turing machine has a finite description, any specific machine will eventually be 
generated by this process.m 


The particular ordering of Turing machines depends on the encoding we use; if we use a different 
encoding, we must expect a different ordering. This is of no consequence, however, and shows that 
the ordering itself is unimportant. What matters is the existence of some ordering. 


EXERCISES 


1. Sketch an algorithm that examines a string in {0,1}* to determine whether or not it represents an 
encoded Turing machine. 


2. Give a complete encoding, using the suggested method, for the Turing machine with 
ô (41,41) = (91.4.2), 
(91,42) = (43,41.L), 
ô (43,41) = (G2,42,L). 


3. Sketch a Turing machine program that enumerates the set {0,1!* in proper order. 


4. What is the index of 0’ in Exercise 3? 


5. Design a Turing machine that enumerates the following set in proper order. 
L= {a"b":n> 1}. 


6. For Example 10.3, find a function f (w) that gives for each w its index in the proper ordering. 
7. Show that the set of all triplets, (i,j,k) with i,j,k positive integers, 1s countable. 


8. Suppose that S, and S, are countable sets. Show that then S4 U S, and S, x S, are also countable. 


9. Show that the Cartesian product of a finite number of countable sets is countable. 


10.5 Linear Bounded Automata 


While it is not possible to extend the power of the standard Turing machine by complicating the tape 
structure, it is possible to limit it by restricting the way in which the tape can be used. We have 
already seen an example of this with pushdown automata. A pushdown automaton can be regarded as 
a nondeterministic Turing machine with a tape that is restricted to being used like a stack. We can 
also restrict the tape usage in other ways; for example, we might permit only a finite part of the tape 
to be used as work space. It can be shown that this leads us back to finite automata (see Exercise 3 at 
the end of this section), so we need not pursue this. But there is a way of limiting tape use that leads 
to a more interesting situation: We allow the machine to use only that part of the tape occupied by the 
input. Thus, more space is available for long input strings than for short ones, generating another class 
of machines, the linear bounded automata (or Iba). 

A linear bounded automaton, like a standard Turing machine, has an unbounded tape, but how 
much of the tape can be used is a function of the input. In particular, we restrict the usable part of the 
tape to exactly the cells taken by the input.! To enforce this, we can envision the input as bracketed by 
two special symbols, the left-end marker [ and the right-end marker |. For an input w, the initial 
configuration of the Turing machine is given by the instantaneous description gg [w]. The end markers 


cannot be rewritten, and the read-write head cannot move to the left of [ or to the right of |. We 
sometimes say that the read-write head “bounces” off the end markers. 


Definition 10.5 
A linear bounded automaton is a nondeterministic Turing machine M = (Q, È, T,6, gp,0,/), as in 


Definition 10.2, subject to the restriction that & must contain two special symbols [ and ], such that 6 
(q;[) can contain only elements of the form (q;, [,R), and ò (q; ]) can contain only elements of the 


form (q; ],L) : 


Definition 10.6 


A string w is accepted by a linear bounded automaton if there is a possible sequence of moves 


qo [w] + [x19¢29] 


for some gy F, x), x. E€ 1'*. The language accepted by the Iba 1s the set of all such accepted strings. 


Note that in this definition a linear bounded automaton is assumed to be nondeterministic. This is 
not just a matter of convenience but essential to the discussion of |ba's. 


Example 10.4 


The language 
L= {a"b"c" :n>1} 


is accepted by some linear bounded automaton. This follows from the discussion in Example 9.8. The 
computation outlined there does not require space outside the original input, so it can be carried out 
by a linear bounded automaton. 


Example 10.5 


Find a linear bounded automaton that accepts the language 
L= {a"':n>0}. 


One way to solve the problem is to divide the number of a’s successively by 2, 3, 4,..., until we can 
either accept or reject the string. If the input is in L, eventually there will be a single a left; if not, at 
some point a nonzero remainder will arise. We sketch the solution to point out one tacit implication of 
Definition 10.5. Since the tape of a linear bounded automaton may be multitrack, the extra tracks can 
be used as work space. For this problem, we can use a two-track tape. The first track contains the 
number of a’s left during the process of division, and the second track contains the current divisor 
(Figure 10.18). The actual solution is fairly simple. Using the divisor on the second track, we divide 
the number of a's on the first track, say by removing all symbols except those at multiples of the 
divisor. After this, we increment the divisor by one, and continue until we either find a nonzero 
remainder or are left with a single a. 


Figure 10.18 


[ a a a a a a ] a's to be examined 


[ a a a ] Current divisor 


The last two examples suggest that linear bounded automata are more powerful than pushdown 
automata, since neither of the languages is context-free. To prove such a conjecture, we still have to 


show that any context-free language can be accepted by a linear bounded automaton. We will do this 
later in a somewhat roundabout way; a more direct approach is suggested in Exercises 6 and 7 at the 
end of this section. It is not so easy to make a conjecture on the relation between Turing machines and 
linear bounded automata. Problems like Example 10.5 are invariably solvable by a linear bounded 
automaton, since an amount of scratch space proportional to the length of the input is available. In 
fact, it is quite difficult to come up with a concrete and explicitly defined language that cannot be 
accepted by any linear bounded automaton. In Chapter 11 we will show that the class of linear 
bounded automata is less powerful than the class of unrestricted Turing machines, but a demonstration 
of this requires a lot more work. 


EXERCISES 


p 


. Give details for the solution of Example 10.5. 
2. Find a solution for Example 10.5 that does not require a second track as scratch space. 


3. Consider an off-line Turing machine in which the input can be read only once, moving left to right, 
and not rewritten. On its work tape, it can use at most n extra cells for work space, where n is 
fixed for all inputs. Show that such a machine is equivalent to a finite automaton. 


4. Find linear bounded automata for the following languages. 
(a) L= {a" :n=m m> 1}. 
(b) L= {a” : nis a prime number}. 
(c) L= {a" : nis nota prime number}. 
(A L= {ww: wefa,bh"t. 
(e) L = {w" : we{a,b}*,n> 2}. 
(f) L= {www* : w efa,b}*}. 
5. Find an lba for the complement of the language in Example 10.5, assuming that È = {a,b}. 


6. Show that for every context-free language there exists an accepting pda, such that the number of 
symbols in the stack never exceeds the length of the input string by more than one. 


7. Use the observation in the above exercise to show that any context-free language not containing A 
is accepted by some linear bounded automaton. 


8. To define a deterministic linear bounded automaton, we can use Definition 10.5, but require that 
the Turing machine be deterministic. Examine your solutions to Exercise 4. Are the solutions all 
deterministic linear bounded automata? If not, try to find solutions that are. 


l In some definitions, the usable part of the tape is a multiple of the input length, where the multiple can depend on the language, but 


not on the input. Here we use only the exact length of the input string, but we do allow multitrack machines, with the input on only one 
track. 


Chapter 11 


A Hierarchy of 
Formal Languages 
and Automata 


e now return our attention to our main interest, the study of formal languages. Our 

immediate goal will be to examine the languages associated with Turing machines and 
W some of their restrictions. Because Turing machines can perform any kind of algorithmic 

computation, we expect to find that the family of languages associated with them is quite 
broad. It includes not only regular and context-free languages, but also the various 
examples we have encountered that lie outside these families. The nontrivial question is whether 
there are any languages that are not accepted by some Turing machine. We will answer this question 
first by showing that there are more languages than Turing machines, so that there must be some 
languages for which there are no Turing machines. The proof is short and elegant, but 
nonconstructive, and gives little insight into the problem. For this reason, we will establish the 
existence of languages not recognizable by Turing machines through more explicit examples that 
actually allow us to identify one such language. Another avenue of investigation will be to look at the 
relation between Turing machines and certain types of grammars and to establish a connection 
between these grammars and regular and context-free grammars. This leads to a hierarchy of 
grammars and through it to a method for classifying language families. Some set-theoretic diagrams 
illustrate the relationships between various language families clearly. 


Strictly speaking, many of the arguments in this chapter are valid only for languages that do not 
include the empty string. This restriction arises from the fact that Turing machines, as we have 
defined them, cannot accept the empty string. To avoid having to rephrase the definition or having to 
add a repeated disclaimer, we make the tacit assumption that the languages discussed in this chapter, 
unless otherwise stated, do not contain à. It is a trivial matter to restate everything so that à is 
included, but we will leave this to the reader. 


11.1 Recursive and Recursively Enumerable Languages 


We start with some terminology for the languages associated with Turing machines. In doing so, we 
must make the important distinction between languages for which there existsan accepting Turing 
machine and languages for which there exists a membership algorithm. Because a Turing machine 
does not necessarily halt on input that it does not accept, the first does not imply the second. 


Definition 11.1 


A language L is said to be recursively enumerable if there exists a Turing machine that accepts it. 


This definition implies only that there exists a Turing machine M, such that, for every w € L, 


* 


gow Lag Lif T9, 


with qç a final state. The definition says nothing about what happens for w not in L; it may be that the 


machine halts in a nonfinal state or that it never halts and goes into an infinite loop. We can be more 
demanding and ask that the machine tell us whether or not any given input is in its language. 


Definition 11.2 


A language L on È is said to be recursive if there exists a Turing machine M that accepts L and that 


halts on everyw in È£*. In other words, a language is recursive if and only if there exists a 
membership algorithm for it. 


If a language is recursive, then there exists an easily constructed enumeration procedure. Suppose 
that M is a Turing machine that determines membership in a recursive language L. We first construct 


another Turing machine, say iail that generates all strings in £" in proper order, let us say wy, W»,.... 


As these strings are generated, they become the input to M, which is modified so that it writes strings 
on its tape only if they are in L. 


That there is also an enumeration procedure for every recursively enumerable language is not as 


easy to see. We cannot use the previous argument as it stands, because if some w; is not in L, the 


machine M, when started with w; on its tape, may never halt and therefore never get to the strings in L 


that follow w; in the enumeration. To make sure that this does not happen, the computation is 


performed in a different way. We first get M. to generate w, and let M execute one move on it. Then 


we let “/ generate w, and let M execute one move on wz, followed by the second move on w After 
this, we generate w 3 and do one step on w 3, the second step on wy, the third step on w, and so on. 
The order of performance is depicted in Figure 11.1. From this, it is clear that M will never get into 


an infinite loop. Since any w € L is generated by M. and accepted by M in a finite number of steps, 
every string in L is eventually produced by M. 


It is easy to see that every language for which an enumeration procedure exists is recursively 
enumerable. We simply compare the given input string against successive strings generated by the 
enumeration procedure. If w € L, we will eventually get a match, and the process can be terminated. 


Definitions 11.1 and 11.2 give us very little insight into the nature of either recursive or 
recursively enumerable languages. These definitions attach names to language families associated 
with Turing machines, but shed no light on the nature of representative languages in these families. 
Nor do they tell us much about the relationships between these languages or their connection to the 


language families we have encountered before. We are therefore immediately faced with questions 
such as “Are there languages that are recursively enumerable but not recursive?” and “Are there 
languages, describable somehow, that are not recursively enumerable?” While we will be able to 
supply some answers, we will not be able to produce very explicit examples to illustrate these 
questions, especially the second one. 


Figure 11.1 
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Languages That Are Not Recursively Enumerable 


We can establish the existence of languages that are not recursively enumerable in a variety of ways. 
One is very short and uses a very fundamental and elegant result of mathematics. 


Theorem 11.1 


Let S be an infinite countable set. Then its power set 2° is not countable. 


Proof: Let S = {s), S2, 53,...}. Then any element t of 2° can be represented by a sequence of 0’s and 
1’s, with a 1 in position 7 if and only ifs; is in t. For example, the set {s5, 53, sg} is represented by 
01100100..., while {s,, 53, s5,...} is represented by 10101.... Clearly, any element of 2° can be 
represented by such a sequence, and any such sequence represents a unique element of 2°. Suppose 
that 2° were countable; then its elements could be written in some order, say ¢), fy,..., and we could 
enter these into a table, as shown in Figure 11.2. In this table, take the elements in the main diagonal, 
and complement each entry, that is, replace 0 with 1, and vice versa. In the example in Figure 11.2, 
the elements are 1100..., so we get 0011...as the result. The new sequence along the diagonal 
represents some element of 2°, say t; or some i. But it cannot be ¢, because it differs from?¢, through 
s,. For the same reason it cannot be f5, t3, or any other entry in the enumeration. This contradiction 


creates a logical impasse that can be removed only by throwing out the assumption that 2° is 
countable. m 


This kind of argument, because it involves a manipulation of the diagonal elements of a table, is 
called diagonalization. The technique is attributed to the mathematician G. F. Cantor, who used it to 
demonstrate that the set of real numbers is not countable. In the next few chapters, we will see a 


similar argument in several contexts. Theorem 11.1 1s diagonalization in its purest form. 


Figure 11.2 


As an immediate consequence of this result, we can show that, in some sense, there are fewer 
Turing machines than there are languages, so that there must be some languages that are not 
recursively enumerable. 


Theorem 11.2 


For any non empty È, there exist languages that are not recursively enumerable. 


Proof: A language is a subset of 5*, and every such subset is a language. Therefore, the set of all 
languages is exactly 2*". Since D* is infinite, Theorem 11.1 tells us that the set of all languages on È is 
not countable. But the set of all Turing machines can be enumerated, so the set of all recursively 
enumerable languages is countable. By Exercise 16 at the end of this section, this implies that there 
must be some languages on È that are not recursively enumerable. m 


This proof, although short and simple, is in many ways unsatisfying. It is completely non 
constructive and, while it tells us of the existence of some languages that are not recursively 
enumerable, it gives us no feeling at all for what these languages might look like. In the next set of 
results, we investigate the conclusion more explicitly. 


A Language That Is Not Recursively Enumerable 


Since every language that can be described in a direct algorithmic fashion can be accepted by a 
Turing machine and hence is recursively enumerable, the description of a language that is not 
recursively enumerable must be indirect. Nevertheless, it is possible. The argument involves a 
variation on the diagonalization theme. 


Theorem 11.3 


There exists a recursively enumerable language whose complement is not recursively enumerable. 


Proof: Let È = {a}, and consider the set of all Turing machines with this input alphabet. By Theorem 
10.3, this set is countable, so we can associate an order M,, M),...with its elements. For each Turing 


machine M,, there is an associated recursively enumerable language L (M;). Conversely, for each 
recursively enumerable language on È, there is some Turing machine that accepts it. 


We now consider a new language L defined as follows. For each i > 1, the string a’ is in L if and 
only ifa’ € L (M). It is clear that the language L is well defined, since the statement a’ € L (M,), and 


hence a’ € L, must be either true or false. Next, we consider the complement of L, 
L = {a’:a' ¢ L(M;)}, (11.1) 
which is also well defined but, as we will show, is not recursively enumerable. 


We will show this by contradiction, starting from the assumption that L is recursively 
enumerable. If this is so, then there must be some Turing machine, say M}, such that 


L=L(M,). (11.2) 


Consider the string a“. Is it in L or in bo Suppose that a* € D. By (11.2) this implies that 
a" € L(M;). 


But (11.1) now implies that 


Alternatively, if we assume that a“ is in L, then a“ ¢ È and (11.2) implies that 
ab ¢ L(M,). 


But then from (11.1) we get that 


The contradiction is inescapable, and we must conclude that our assumption that L is recursively 
enumerable is false. 
To complete the proof of the theorem as stated, we must still show that L is recursively 


enumerable. For this we can use the known enumeration procedure for Turing machines. Given a’, we 
first find 7 by counting the number of a’s. We then use the enumeration procedure for Turing machines 


to find M,. Finally, we give its description along witha’ to a universal Turing machine M,, that 
simulates the action of M ona’. Ifa’ is in L, the computation carried out by M,, will eventually halt. 


The combined effect of this is a Turing machine that accepts every a’ € L. Therefore, by Definition 
11.1, L is recursively enumerable. m 


The proof of this theorem explicitly exhibits, through (11.1), a well-defined language that is not 


recursively enumerable. This is not to say that there is an easy, intuitive interpretation of L; it would 


be difficult to exhibit more than a few trivial members of this language. Nevertheless, L is properly 
defined. 
A Language That Is Recursively Enumerable but Not Recursive 


Next, we show there are some languages that are recursively enumerable but not recursive. Again, we 
need do so ina rather roundabout way. We begin by establishing a subsidiary result. 


Theorem 11.4 


If a language L and its complement L are both recursively enumerable, then both languages are 


recursive. IfZ is recursive, then L is also recursive, and consequently both are recursively 
enumerable. 


Proof: IfZ and L are both recursively enumerable, then there exist Turing machines M and M. that 


serve as enumeration procedures for L and L, respectively. The first will produce w4, w>,...1n L, the 


second 1, @2,.-- in L. Suppose now we are given any w € E". We first let M generate w; and 


compare it with w. If they are not the same, we let at generate ‘1 and compare again. If we need to 


continue, we next let M generate w, then at generate “2, and so on. Any w € E* will be generated by 


either M or al so eventually we will get a match. If the matching string is produced by M, w belongs 


to L, otherwise it is in L. The process is a membership algorithm for both Z and L, so they are both 
recursive. 
For the converse, assume that L is recursive. Then there exists a membership algorithm for it. But 


this becomes a membership algorithm for È by simply complementing its conclusion. Therefore, 6 
is recursive. Since any recursive language is recursively enumerable, the proof is completed. m 


From this, we conclude directly that the family of recursively enumerable languages and the 
family of recursive languages are not identical. The language L in Theorem 11.3 is in the first but not 
in the second family. 


Theorem 11.5 


There exists a recursively enumerable language that is not recursive; that is, the family of recursive 


languages is a proper subset of the family of recursively enumerable languages. 


Proof: Consider the language L of Theorem 11.3. This language is recursively enumerable, but its 
complement is not. Therefore, by Theorem 11.4, it is not recursive, giving us the looked-for example. 
| 


We see from this that there are indeed well-defined languages for which one cannot construct a 
membership algorithm. 


EXERCISES 


1. Prove that the set of all real numbers is not countable. 
2. Prove that the set of all languages that are not recursively enumerable is not countable. 


3. Let L be a finite language. Show that then L* is recursively enumerable. Suggest an enumeration 
procedure for L*. 


4. LetL be a context-free language. Show that” is recursively enumerable and suggest an 
enumeration procedure for it. 


. Show that if a language is not recursively enumerable, its complement cannot be recursive. 
. Show that the family of recursively enumerable languages is closed under union. 
. Is the family of recursively enumerable languages closed under intersection? 


. Show that the family of recursive languages is closed under union and intersection. 
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. Show that the families of recursively enumerable and recursive languages are closed under 
reversal. 


10. Is the family of recursive languages closed under concatenation? 
11. Prove that the complement of a context-free language must be recursive. 


12. Let L, be recursive and L, recursively enumerable. Show that L, — L, is necessarily recursively 
enumerable. 


13. Suppose that L is such that there exists a Turing machine that enumerates the elements of L in 
proper order. Show that this means that L is recursive. 


14. If L is recursive, is it necessarily true that L* is also recursive? 


15. Choose a particular encoding for Turing machines, and with it, find one element of the language 
L in Theorem 11.3. 


16. Let S} be a countable set, S, a set that is not countable, and S4 © S2. Show that S, must then 
contain an infinite number of elements that are not in $4. 


17. In Exercise 16, show that in fact S$, — S4 cannot be countable. 


18. Why does the argument in Theorem 11.1 fail when S is finite? 


19. Show that the set of all irrational numbers is not countable. 


11.2 Unrestricted Grammars 


To investigate the connection between recursively enumerable languages and grammars, we return to 
the general definition of a grammar in Chapter 1. In Definition 1.1 the production ruleswere allowed 
to take any form,but various restrictions were later made to get specific grammar types. If we take the 
general form and impose no restrictions, we get unrestricted grammars. 


Definition 11.3 


A grammar G =(V, T, S, P) is called unrestricted if all the productions are of the form 
u— v, 


where u is in (V U T)" and v is in (V U T)*. 


In an unrestricted grammar, essentially no conditions are imposed on the productions. Any number 
of variables and terminals can be on the left or right, and these can occur in any order. There is only 
one restriction: Å is not allowed as the left side of a production. 


As we will see, unrestricted grammars are much more powerful than restricted forms like the 
regular and context-free grammars we have studied so far. In fact, unrestricted grammars correspond 
to the largest family of languages so we can hope to recognize by mechanical means; that is, 
unrestricted grammars generate exactly the family of recursively enumerable languages. We show this 
in two parts; the first is quite straightforward, but the second involves a lengthy construction. 


Theorem 11.6 


Any language generated by an unrestricted grammar is recursively enumerable. 


Proof: The grammar in effect defines a procedure for enumerating all strings in the language 
systematically. For example, we can list all w in Z such that 


S >w, 


that is, w is derived in one step. Since the set of the productions of the grammar is finite, there will be 
a finite number of such strings. Next, we list all w in Z that can be derived in two steps 


S >x >w, 


and so on. We can simulate these derivations on a Turing machine and, therefore, have an 
enumeration procedure for the language. Hence it is recursively enumerable. m 


This part of the correspondence between recursively enumerable languages and unrestricted 
grammars is not surprising. The grammar generates strings by a well-defined algorithmic process, so 
the derivations can be done on a Turing machine. To show the converse, we describe how any Turing 
machine can be mimicked by an unrestricted grammar. 


We are given a Turing machine M = (Q, £, Il, ô, qo, O, F) and want to produce a grammar G 
such that L (G) = L (M). The idea behind the construction is relatively simple, but its implementation 
becomes notationally cumbersome. 


Since the computation of the Turing machine can be described by the sequence of instantaneous 
descriptions 


qow H Tq FY; (11.3) 
we will try to arrange it so that the corresponding grammar has the property that 
qow => rary (11.4) 


if and only if (11.3) holds. This is not hard to do; what is more difficult to see 1s how to make the 
connection between (11.4) and what we really want, namely, 


ae 
o> w 


for all w satisfying (11.3). To achieve this, we construct a grammar which, in broad outline, has the 
following properties: 


1. S can derive qow for all w € x". 


2. (11.4) is possible if and only if (11.3) holds. 
3. When a string xqr y with qf € F is generated, the grammar transforms this string into the original w. 


The complete sequence of derivations is then 
SŠ pw 5 rqry > w. (11.5) 


The third step in the above derivation is the troublesome one. How can the grammar remember w if it 
is modified during the second step? We solve this by encoding strings so that the coded version 
originally has two copies ofw. The first is saved, while the second is used in the steps in (11.4). 
When a final configuration is entered, the grammar erases everything except the saved w. 

To produce two copies of w and to handle the state symbol of M (which eventually has to be 
removed by the grammar), we introduce variables V,, and V,;, for alla € XU {0}, b eT, and alli 
such that q; € Q. The variable V,,, encodes the two symbols a and b, while V,;, encodes a and b as 


well as the state q;. 


The first step in (11.5) can be achieved (in the encoded form) by 
Beh Voos |SVoolT, (11.6) 
T — TVoaelVada, 11.7) 


for all a € X. These productions allow the grammar to generate an encoded version of any string qow 
with an arbitrary number of leading and trailing blanks. 
For the second step, for each transition 


ô ( qi aC ) 


(g;,d, R) 
of M, we put into the grammar productions 


V 


VaV (11.8) 


P99? 


for all a, p € X U {0},q €T. For each 
of M, we include in G 


E POET 


pq Vad; (11.9) 


PIW 


for all a, p € È U {O}.geTl. 


If in the second step, M enters a final state, the grammar must then get rid of everything except w, 
which is saved in the first indices of the V’s. Therefore, for every q; € F, we include productions 


Vajb > @, (11.10) 


for alla € XU {0}, b e I. This creates the first terminal in the string, which then causes a rewriting 
in the rest by 


eV, — ca, (11.11) 
Vate — ac, (11.12) 

for all a,c E€ XU {0}, b eI . We need one more special production 
LI— A. (11.13) 


This last production takes care of the case when M moves outside that part of the tape occupied by the 
input w. To make things work in this case, we must first use (11.6) and (11.7) to generate 


Diva gg. 


representing all the tape region used. The extraneous blanks are removed at the end by (11.13). 


The following example illustrates this complicated construction. Carefully check each step in the 
example to see what the various productions do and why they are needed. 


Example 11.1 


Let M = (Q, 2, T, ò, do, O,F) be a Turing machine with 


Q= {qoqi}; 
T= {a, b, D}, 
Z= {a,b}, 
F= {q1} 


and 
ò(qo, a) = (do, a, R), 
ò(qo, O) = (qı, D, L). 


This machine accepts L (aa*). 
Consider now the computation 


qoaa F aqoa F aaqoO | aqya, (11.14) 


which accepts the string aa. To derive this string with G, we first use rules of the form (11.6) and 
(11.7) to get the appropriate starting string, 


S > SVpn > TVoo = TVaa Voo = Vaoa Vaa VoD. 


The last sentential form is the starting point for the part of the derivation that mimics the computation 
of the Turing machine. It contains the original input aaū in the sequence of first indices and the initial 
instantaneous description qoaa O in the remaining indices. Next, we apply 


Va0aVaa — VaaVa0a; 
and 
Vada Von — Vaa Voon, 
which are specific instances of (11.8), and 
Vaa Voog — Vata Vog 


coming from (11.9). Then the next steps in the derivation are 


Va0a Vaa Voo = Vaa Va0a VO Ti VaeVaeVO00 => VaoVaia VO: 


The sequence of first indices remains the same, always remembering the initial input. The sequence of 
the other indices is 


Oaa O, a0 O, ala O, 


which is equivalent to the sequence of instantaneous descriptions in (11.14). Finally, (11.10) to 
(11.13) are used in the last steps 


VaaVaiaVoO0 => Vaaa Vii => Vaca => aall = aa. 


The construction described in (11.6) to (11.13) is the basis of the proof of the following result. 


Theorem 11.7 


For every recursively enumerable language L, there exists an unrestricted grammar G, such that L = 
L(G). 


Proof: The construction described guarantees that 
TFA 
then 
e (£) => e (y); 


where e (x) denotes the encoding of a string according to the given convention. By an induction on the 
number of steps, we can then show that 


e (qow) = ely) 


if and only if 


pa 
gow F y. 


We also must show that we can generate every possible starting configuration and that w is properly 
reconstructed if and only if M enters a final configuration. The details, which are not too difficult, are 
left as an exercise.m 


These two theorems establish what we set out to do. They show that the family of languages 
associated with unrestricted grammars is identical with the family of recursively enumerable 
languages. 


EXERCISES 


1. What language does the unrestricted grammar 
S— SB, 
Sı — a Sb 
bB — bbbB, 


aS,b— aa 


BouX 
derive? 


2. What difficulties would arise if we allowed the empty string as the left side of a production in an 
unrestricted grammar? 


3. Consider a variation on grammars in which the starting point for any derivation can be a finite set 
of strings, rather than a single variable. Formalize this concept, then investigate how such 
grammars relate to the unrestricted grammars we have used here. 


. In Example 11.1, prove that the constructed grammar cannot generate any sentence with a b in it. 
5. Give the details of the proof of Theorem 11.7. 


6. Construct a Turing machine for L (01 (01)*), then find an unrestricted grammar for it using the 
construction in Theorem 11.7. Give a derivation for 0101 using the resulting grammar. 


7. Show that for every unrestricted grammar there exists an equivalent unrestricted grammar, all of 
whose productions have the form 


with u, v (VU TY" and |u| < þol, or 


withhd4AeV 


8. Show that the conclusion of Exercise 7 still holds if we add the further conditions |u| < 2 and |p| < 
2. 


9. Some authors give a definition of unrestricted grammars that is not quite the same as our Definition 


11.3. In this alternate definition, the productions of an unrestricted grammar are required to be of 
the form 


X —> y, 


where 
xEe(VUT)* V(VUT)*, 
and 
yEe(VUT)*. 


The difference is that here the left side must have at least one variable. Show that this alternate 
definition is basically the same as the one we use, in the sense that for every grammar of one type, 
there is an equivalent grammar of the other type. 


11.3 Context-Sensitive Grammars and Languages 


Between the restricted, context-free grammars and the general, unrestricted grammars, a great variety 
of “somewhat restricted” grammars can be defined. Not all cases yield interesting results; among the 
ones that do, the context-sensitive grammars have received considerable attention. These grammars 
generate languages associated with a restricted class of Turing machines, linear bounded automata, 
which we introduced in Section 10.5. 


Definition 11.4 


A grammar G = (V, T, S, P) is said to be context-sensitive if all productions are of the form 
xy, 
where x, y € (VU 7)" and 


ja] < |yl. (11.15) 


This definition shows clearly one aspect of this type of grammar; it is noncontracting, in the 
sense that the length of successive sentential forms can never decrease. It is less obvious why such 
grammars should be called context-sensitive, but it can be shown (see, for example, Salomaa 1973) 
that all such grammars can be rewritten in a normal form in which all productions are of the form 


xAy — xvy. 
This is equivalent to saying that the production 
A>v 


can be applied only in the situation where A occurs in a context of the string x on the left and the string 
y on the right. While we use the terminology arising from this particular interpretation, the form itself 


is of little interest to us here, and we will rely entirely on Definition 11.4. 


Context-Sensitive Languages and Linear Bounded Automata 


As the terminology suggests, context-sensitive grammars are associated with a language family with 
the same name. 


Definition 11.5 


A language L is said to be context-sensitive if there exists a context-sensitive grammar G, such 
that L = L(G) or L= L(G) y{A}. 


In this definition, we reintroduce the empty string. Definition 11.4 implies thatx — à is not 
allowed, so that a context-sensitive grammar can never generate a language containing the empty 
string. Yet, every context-free language without à can be generated by a special case of a context- 
sensitive grammar, say by one in Chomsky or Greibach normal form, both of which satisfy the 
conditions of Definition 11.4. By including the empty string in the definition of a context-sensitive 
language (but not in the grammar), we can claim that the family of context-free languages is a subset 
of the family of context-sensitive languages. 


Example 11.2 


The language L = {a"b"c": n > 1} is a context-sensitive language. We show this by exhibiting a 
context-sensitive grammar for the language. One such grammar is 


S — abclaAbe, 


Ab — GA. 
Ac — Bbcc, 
bB — Bb, 


aB — aalaaA. 
We can see how this works by looking at a derivation of a*b°c°. 


S => aAbc => abAc > abBbcc 
= aBbbcc = aaAbbcc => aabAbcc 
= aabbAcc => aabbBbccc 


= aaabbbccc. 


The solution effectively uses the variables A and B as messengers. An A is created on the left, travels 


to the right to the first c, where it creates another b and c. It then sends the messenger B back to the 
left in order to create the corresponding a. The process is very similar to the way one might program 
a Turing machine to accept the language L. 


Since the language in the previous example is not context-free, we see that the family of context- 
free languages is a proper subset of the family of context-sensitive languages. Example 11.2 also 
shows that it is not an easy matter to find a context-sensitive grammar even for relatively simple 
examples. Often the solution is most easily obtained by starting with a Turing machine program, then 
finding an equivalent grammar for it. A few examples will show that, whenever the language is 
context-sensitive, the corresponding Turing machine has predictable space requirements; in 
particular, it can be viewed as a linear bounded automaton. 


Theorem 11.8 


For every context-sensitive language L not including À, there exists some linear bounded automaton M 
such that L = L (M). 


Proof: If L is context-sensitive, then there exists a context-sensitive grammar for L — {A}. We show 
that derivations in this grammar can be simulated by a linear bounded automaton. The linear bounded 
automaton will have two tracks, one containing the input string w, the other containing the sentential 
forms derived using G. A key point of this argument is that no possible sentential form can have length 
greater than |w|. Another point to notice is that a linear bounded automaton is, by definition, non 
deterministic. This is necessary in the argument, since we can claim that the correct production can 
always be guessed and that no unproductive alternatives have to be pursued. Therefore, the 
computation described in Theorem 11.6 can be carried out without using space except that originally 
occupied by w; that is, it can be done by a linear bounded automaton. m 


Theorem 11.9 


Ifa language L is accepted by some linear bounded automaton M, then there exists a context-sensitive 
grammar that generates L. 


Proof: The construction here is similar to that in Theorem 11.7. All productions generated in 
Theorem 11.7 are non contracting except (11.13), 


Oi. 


But this production can be omitted. It is necessary only when the Turing machine moves outside the 
bounds of the original input, which is not the case here. The grammar obtained by the construction 
without this unnecessary production is non contracting, completing the argument. m 


Relation Between Recursive and Context-Sensitive Languages 


Theorem 11.9 tells us that every context-sensitive language is accepted by some Turing machine and 
is therefore recursively enumerable. Theorem 11.10 follows easily from this. 


Theorem 11.10 


Every context-sensitive language L is recursive. 


Proof: Consider the context-sensitive language L with an associated context-sensitive grammar G, 
and look at a derivation of w 


SS SS 2p See Se SSW, 


We can assume without any loss of generality that all sentential forms in a single derivation are 
different; that is, x; #X; for alli #7. The crux of our argument is that the number of steps in any 


derivation is a bounded function of |w|. We know that 
|x; | A S] ’ 


because G is non contracting. The only thing we need to add is that there exist some m, depending 
only on G and w, such that 


|x; | < Bawa | ’ 


for all j, with m = m(|w)) a bounded function of |V U T| and |w|. This follows because the finiteness of 
IV U T| implies that there are only a finite number of strings of a given length. Therefore, the length of 
a derivation ofw € L is at most |w| m({w)). 


This observation gives us immediately a membership algorithm for L. We check all derivations of 
length up to |w| m(|w)). Since the set of productions of G is finite, there are only a finite number of 
these. If any of them give w, then w € L, otherwise it is not. m 


Theorem 11.11 


There exists a recursive language that is not context-sensitive. 


Proof: Consider the set of all context-sensitive grammars on T= {a, b}. We can use a convention in 
which each grammar has a variable set of the form 


V = {Vo, Vi, Vay}. 


Every context-sensitive grammar is completely specified by its productions; we can think of them as 
written as a single string 


Ly > Y1; T2 > Y2; Tm > Ym- 


To this string we now apply the homomorphism 


h(a) = 010, 
h(b) = 0120, 


h(—) = 0180, 
h(;) = 0140, 
h(V;) = 01°*50. 


Thus, any context-sensitive grammar can be represented uniquely by a string from L ((011*0)*). 
Furthermore, the representation is invertible in the sense that, given any such string, there is at most 
one context-sensitive grammar corresponding to it. 

Let us introduce a proper ordering on {0,1} *, so we can write strings in the order w}, w>, etc. A 
given string w; may not define a context-sensitive grammar; if it does, call the grammar Gj. Next, we 
define a language L by 


L= {w; : w; defines a context-sensitive grammar G; and w; ¢ L (G; )}. 


L is well defined and is in fact recursive. To see this, we construct a membership algorithm. Given 
wi, we check it to see if it defines a context-sensitive grammar G,. If not, then w; ¢ L. If the string 


does define a grammar, then L ( G;) is recursive, and we can use the membership algorithm of 
Theorem 11.10 to find out if w; ¢ L (G). If itis not, then w; belongs to L. 


But L is not context-sensitive. If it were, there would exist some w; such that L = L (G;). We can 


then ask ifw ; is inZ (G;). If we assume that w; € L (G;), then by definition G;), so we have a 
contradiction. Conversely, if we assume that w, ¢ L (G;), then by definition w; € L and we have 
another contradiction. We must therefore conclude that L is not context-sensitive. m 


The result in Theorem 11.11 indicates that linear bounded automata are indeed less powerful than 
Turing machines, since they accept only a proper subset of the recursive languages. It follows from 
the same result that linear bounded automata are more powerful than pushdown automata. Context- 
free languages, being generated by context-free grammars, are a subset of the context-sensitive 
languages. As various examples show, they are a proper subset. Because of the essential equivalence 
of linear bounded automata and context-sensitive languages on one hand, and pushdown automata and 
context-free languages on the other, we see that any language accepted by a pushdown automaton is 
also accepted by some linear bounded automaton, but that there are languages accepted by some 
linear bounded automata for which there are no pushdown automata. 


EXERCISES 


* |. Find context-sensitive grammars for the following languages. 


(a) L={atbc™! :n>1} 
(b) L = {aba : n> 1}. 
(c) L = {a”b”c” d": n> 1, m> 1}. 
(A L= {ww: we {a, bt. 
(e) L= {a"b"c"d" : n> 1}, 
* 2. Find context-sensitive grammars for the following languages. 
(a) L= { w: na (W) = np (W) =n, (W) 
(b) L= { w: na (W) = np (W) < ne (w) 
3. Show that the family of context-sensitive languages is closed under union. 
4. Show that the family of context-sensitive languages is closed under reversal. 
5. For m in Theorem 11.10, give explicit bounds for m as a function of |w| and |V U T 


6. Without explicitly constructing it, show that there exists a context-sensitive grammar for the 
language L = {wuw* : w, u € {a, b}*, |w] > lul}. 


11.4 The Chomsky Hierarchy 


We have now encountered a number of language families, among them the recursively enumerable 
languages (Lpg), the context-sensitive languages (Lcs), the context-free languages (Lcr), and the 
regular languages(Lrrg). One way of exhibiting the relationship between these families is by the 


Chomsky hierarchy. Noam Chomsky, a founder of formal language theory, provided an initial 
classification into four language types, type 0 to type 3. This original terminology has persisted and 
one finds frequent references to it, but the numeric types are actually different names for the language 
families we have studied. Type 0 languages are those generated by unrestricted grammars, that is, the 
recursively enumerable languages. Type 1 consists of the context-sensitive languages, type 2 consists 
of the context-free languages, and type 3 consists of the regular languages. As we have seen, each 
language family of type i is a proper subset of the family of type i — 1. A diagram (Figure 11.3) 
exhibits the relationship clearly. Figure 11.3 shows the original Chomsky hierarchy. We have also 
met several other language families that can be fitted into this picture. Including the families of 
deterministic context-free languages(; pcpr) and recursive languages (Lp-c), we arrive at the extended 


hierarchy shown in Figure 11.4. 
Figure 11.3 


Figure 11.4 


Other language families can be defined and their place in Figure 11.4 studied, although their 
relationships do not always have the neatly nested structure of Figures 11.3 and 11.4. In some 
instances, the relationships are not completely understood. 


Example 11.3 
We have previously introduced the context-free language 
L= {w : ng (wW) = n; (w)} 
and shown that it is deterministic, but not linear. On the other hand, the language 
L= {a”b”"} U {a%b?"} 


Figure 11.5 


is linear, but not deterministic. This indicates that the relationship between regular, linear, 
deterministic context-free, and nondeterministic context-free languages is as shown in Figure 11.5. 


There is still an unresolved issue. We introduced the concept of a deterministic linear bounded 
automaton in Exercise 8, Section 10.5. We can now ask the question we asked in connection with 
other automata: What role does nondeterminism play here? Unfortunately, there is no easy answer. At 
this time, it is not known whether the family of languages accepted by deterministic linear bounded 
automata is a proper subset of the context-sensitive languages. 


To summarize, we have explored the relationships between several language families and their 
associated automata. In doing so, we established a hierarchy of languages and classified automata by 
their power as language accepters. Turing machines are more powerful than linear bounded automata. 
These in turn are more powerful than pushdown automata. At the bottom of the hierarchy are finite 
accepters, with which we began our study. 


EXERCISES 


1. Collect examples given in this book that demonstrate that all the subset relations depicted in 
Figure 11.4 are indeed proper ones. 


2. Find two examples (excluding the one inExample 11.3) of languages that are linear but not 
deterministic context-free. 


3. Find two examples (excluding the one in Example 11.3) of languages that are deterministic 
context-free but not linear. 


Chapter 12 


Limits of Algorithmic Computation 


aving talked about what Turing machines can do, we now look at what they cannot do. 

Although Turing's thesis leads us to believe that there are few limitations to the power ofa 
H Turing machine, we have claimed on several occasions that there could not exist any 

algorithms for the solution of certain problems. Now we make more explicit what we 
mean by this claim. Some of the results came about quite simply;if a language is 
nonrecursive, then by definition there is no membership algorithm for it. If this were all there was to 
this issue, it would not be very interesting; nonrecursive languages have little practical value. But the 
problem goes deeper. For example, we have stated (but not yet proved) that there exists no algorithm 
to determine whether a context-free grammar is unambiguous. This question is clearly of practical 
significance in the study of programming languages. 


We first define the concepts of decidability and computability to pin down what we mean when 
we say that something cannot be done by a Turing machine. We then look at several classical 
problems of this type, among them the well-known halting problem for Turing machines. From this 
follow a number of related problems for Turing machines and recursively enumerable languages. 
After this, we look at some questions relating to context-free languages. Here we find quite a few 
important problems for which, unfortunately, there are no algorithms. 


12.1 Some Problems That Cannot Be Solved by Turing Machines 


The argument that the power of mechanical computations is limited is not surprising. Intuitively we 
know that many vague and speculative questions require special insight and reasoning well beyond 
the capacity of any computer that we can now construct or even foresee. What is more interesting to 
computer scientists is that there are questions that can be clearly and simply stated, with an apparent 
possibility of an algorithmic solution, but which are known to be unsolvable by any computer. 


Computability and Decidability 


In Definition 9.4, we stated that a function f on a certain domain is said to be computable if there 
exists a Turing machine that computes the value of f for all arguments in its domain. A function is 
uncomputable if no such Turing machine exists. There may be a Turing machine that can compute f on 
part of its domain, but we call the function computable only if there is a Turing machine that computes 
the function on the whole of its domain. We see from this that, when we classify a function as 
computable or not computable, we must be clear on what its domain is. 


Our concern here will be the somewhat simplified setting where the result of a computation is a 


simple “yes” or “no.” In this case, we talk about a problem being decidable or undecidable. By a 
problem we will understand a set of related statements, each of which must be either true or false. 
For example, we consider the statement “For a context-free grammar G, the language L (G) is 
ambiguous.” For some G this is true, for others it is false, but clearly we must have one or the other. 
The problem is to decide whether the statement is true for any G we are given. Again, there is an 
underlying domain, the set of all context-free grammars. We say that a problem is decidable if there 
exists a Turing machine that gives the correct answer for every statement in the domain of the 
problem. 


When we state decidability or undecidability results, we must always know what the domain is, 
because this may affect the conclusion. The problem may be decidable on some domain but not on 
another. Specifically, a single instance of a problem is always decidable, since the answer is either 
true or false. In the first case, a Turing machine that always answers “true” gives the correct answer, 
while in the second case one that always answers “false” is appropriate. This may seem like a 
facetious answer, but it emphasizes an important point. The fact that we do not know what the correct 
answer is makes no difference; what matters is that there exists some Turing machine that does give 
the correct response. 


The Turing Machine Halting Problem 


We begin with some problems that have historical significance and that at the same time give us a 
starting point for developing later results. The best-known of these is the Turing machine halting 
problem. Simply stated, the problem is: Given the description of a Turing machine M and an input w, 
does M, when started in the initial configuration qow, perform a computation that eventually halts? 
Using an abbreviated way of talking about the problem, we ask whether M applied to w, or simply 
(M,w), halts or does not halt. The domain of this problem is to be taken as the set of all Turing 
machines and all w; that is, we are looking for a single Turing machine that, given the description of 
an arbitrary M and w, will predict whether or not the computation of M applied to w will halt. 

We cannot find the answer by simulating the action of M on w, say by performing it on a universal 
Turing machine, because there is no limit on the length of the computation. If M enters an infinite 
loop, then no matter how long we wait, we can never be sure that M is in fact ina loop. It may simply 
be a case of a very long computation. What we need is an algorithm that can determine the correct 
answer for any M and w by performing some analysis on the machine's description and the input. But 
as we now show, no such algorithm exists. 


For subsequent discussion, it is convenient to have a precise idea of what we mean by the halting 
problem; for this reason, we make a specific definition of what we stated somewhat loosely above. 


Definition 12.1 


Let wy be a string that describes a Turing machine M = (Q,2,1',0,¢0,0,F ), and let w be a string in 
M’s alphabet. We will assume that w,, and w are encoded as a string of 0’s and 1’s, as suggested in 
Section 10.4. A solution of the halting problem is a Turing machine H, which for any wọ and w 
performs the computation 


* 


qo wM w H Tigy T2 


if M applied to w halts, and 


+ 
qowmMm W F Y1gnY2 


if M applied to w does not halt. Here q, and q, are both final states of H. 


Theorem 12.1 


There does not exist any Turing machine H that behaves as required by Definition 12.1. The halting 
problem is therefore undecidable. 


Proof: We assume the contrary, namely, that there exists an algorithm, and consequently some Turing 
machine H, that solves the halting problem. The input to H will be the string wyw. The requirement is 
then that, given any w,,w, the Turing machine H will halt with either a yes or no answer. We achieve 
this by asking that H halt in one of two corresponding final states, say, q, or q„. The situation can be 


visualized by a block diagram like Figure 12.1. The intent of this diagram is to indicate that, if H is 
started in state gq with input wyw, it will eventually halt in state q, or g,. As required by Definition 


12.1, we want H to operate according to the following rules: 


* 


GowM w FHr1gy®2 


if M applied to w halts, and 
gowmw FH Widny2 


if M applied to w does not halt. 


Next, we modify H to produce a Turing machine H’ with the structure shown in Figure 12.2. With 
the added states in Figure 12.2 we want to convey that the transitions between state q, and the new 


states q, and q, are to be made, regardless of the tape symbol, in such a way that the tape remains 


unchanged. The way this is done is straightforward. Comparing H and H’ we see that, in situations 
where H reaches q, and halts, the modified machine H’ will enter an infinite loop. Formally, the 


action of H’ is described by 


+ 


gow w Fy ox 


if M applied to w halts, and 


n 
qowMW FH’ Y1InY2 


if M applied to w halts, and 
Figure 12.1 


Figure 12.2 


FMS 


From H' we construct another Turing machine H. This new machine takes as input wą and copies 


it, ending in its initial state go. After that, it behaves exactly like H’ . Then the action of H is such that 


+ + 
qowm Fg gowmwm bg oo 


if M applied to wọ halts, and 


+ + 


qowm Fa dowm wo Fg YInY2 


if M applied to wy, does not halt. 


Now 4 is a Turing machine, so it has a description in {0,1}*, say, @. This string, in addition to 
being the description of H, also can be used as input string. We can therefore legitimately ask what 
would happen if H is applied to @ From the above, identifying M with H, we get 


+ 
do Th Fg x 


if H applied to @ halts, and 


g 
gow Fa Y1dnye 


if applied to @ does not halt. This is clearly nonsense. The contradiction tells us that our 
assumption of the existence of H, and hence the assumption of the decidability of the halting problem, 
must be false. m 


One may object to Definition 12.1, since we required that, to solve the halting problem, H had to 
start and end in very specific configurations. It is, however, not hard to see that these somewhat 
arbitrarily chosen conditions play only a minor role in the argument, and that essentially the same 
reasoning could be used with any other starting and ending configurations. We have tied the problem 
to a specific definition for the sake of the discussion, but this does not affect the conclusion. 


It is important to keep in mind what Theorem 12.1 says. It does not preclude solving the halting 
problem for specific cases; often we can tell by an analysis of M and w whether or not the Turing 
machine will halt. What the theorem says is that this cannot always be done; there is no algorithm that 
can make a correct decision for all w,, and w. 


The arguments for proving Theorem 12.1 were given because they are classical and of historical 
interest. The conclusion of the theorem is actually implied in previous results as the following 
argument shows. 


Theorem 12.2 


If the halting problem were decidable, then every recursively enumerable language would be 
recursive. Consequently, the halting problem is undecidable. 


Proof: To see this, let L be a recursively enumerable language on È, and let M be a Turing machine 
that accepts L. Let H be the Turing machine that solves the halting problem. We construct from this the 
following procedure: 


1. Apply H to wyw. If H says “no,” then by definition w is not in L. 


2. If H says “yes,” then apply M to w. But M must halt, so it will eventually tell us whether w is in L 
or not. 


This constitutes a membership algorithm, making L recursive. But we already know that there are 
recursively enumerable languages that are not recursive. The contradiction implies that H cannot 
exist, that is, that the halting problem is undecidable. m 


The simplicity with which the halting problem can be obtained from Theorem 11.5 is a 
consequence of the fact that the halting problem and the membership problem for recursively 
enumerable languages are nearly identical. The only difference is that in the halting problem we do 
not distinguish between halting in a final and nonfinal state, whereas in the membership problem we 
do. The proofs of Theorem 11.5 (via Theorem 11.3) and 12.1 are closely related, both being a 
version of diagonalization. 


Reducing One Undecidable Problem to Another 


The above argument, connecting the halting problem to the membership problem, illustrates the very 
important technique of reduction. We say that a problem Æ is reduced to a problemB if the 
decidability of A follows from the decidability of B. Then, if we know that A is undecidable, we can 
conclude that B is also undecidable. Let us do a few examples to illustrate this idea. 


Example 12.1 


The state-entry problem is as follows. Given any Turing machine M = (Q,2,1',0,¢9,0,/) and any q € 
QO, w € x", decide whether or not the state q is ever entered when M is applied to w. This problem is 
undecidable. 


To reduce the halting problem to the state-entry problem, suppose that we have an algorithm A that 
solves the state-entry problem. We could then use it to solve the halting problem. For example, given 


any M and w, we first modify M to get M. in sucha way that M. halts in state q if and only if M halts. 
We can do this simply by looking at the transition function 6 of M. If M halts, it does so because some 


ò(q;,a) is undefined. To get Me we change every such undefined 6 to 


ò(q;a) > (q,a,R), 


where q is a final state. We apply the state-entry algorithm A to ‘eae q,w). If A answers yes, that is, 
the state q is entered, then (M,w) halts. If A says no, then (M,w) does not halt. 


Thus, the assumption that the state-entry problem is decidable gives us an algorithm for the halting 
problem. Because the halting problem is undecidable, the state-entry problem must also be 
undecidable. 


Example 12.2 


The blank-tape halting problem is another problem to which the halting problem can be reduced. 
Given a Turing machine M, determine whether or not M halts if started with a blank tape. This is 
undecidable. 


To show how this reduction is accomplished, assume that we are given some M and some w. We 
first construct from M a new machine M that starts with a blank tape, writes w on it, then positions 
itself in a configuration qow. After that, M„ acts like M . Clearly M„ will halt on a blank tape if and 
only if M halts on w. 

Suppose now that the blank-tape halting problem were decidable. Given any (Vw), we first 
construct M,,, then apply the blank-tape halting problem algorithm to it. The conclusion tells us 
whether M applied to w will halt. Since this can be done for any M and w, an algorithm for the blank- 


tape halting problem can be converted into an algorithm for the halting problem. Since the latter is 
known to be undecidable, the same must be true for the blank-tape halting problem. 


The construction in the arguments of these two examples illustrates an approach common in 
establishing undecidability results. A block diagram often helps us visualize the process. The 
construction in Example 12.2 is summarized in Figure 12.3. In that diagram, we first use an algorithm 
that transforms (M,w)into M,, ; such an algorithm clearly exists. Next, we use the algorithm for 
solving the blank-tape halting problem, which we assume exists. Putting the two together yields an 
algorithm for the halting problem. But this is impossible, and we can conclude that A cannot exist. 


Figure 12.3 


Algorithm for the halting problem. 
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A decision problem is effectively a function with a range {0,1}, that is, a true or false answer. 
We can look also at more general functions to see if they are computable; to do so, we follow the 
established method and reduce the halting problem (or any other problem known to be undecidable) 
to the problem of computing the function in question. Because of Turing's thesis, we expect that 
functions encountered in practical circumstances will be computable, so for examples of 
uncomputable functions we must look a little further. Most examples of uncomputable functions are 
associated with attempts to predict the behavior of Turing machines. 


Example 12.3 


Let T = {0,1,0}. Consider the function f (n) whose value is the maximum number of moves that can be 
made by any n-state Turing machine that halts when started with a blank tape. This function, as it turns 
out, is not computable. 


Before we set out to demonstrate this, let us make sure that f (n)is defined for all n. Notice first 
that there are only a finite number of Turing machines with states. This is because Q and I are 
finite, so 6 has a finite domain and range. This in turn implies that there are only a finite number of 
different 5’s and therefore a finite number of different n-state Turing machines. 


Of all of the n-state machines, there are some that always halt, for example machines that have 
only final states and therefore make no moves. Some of the n-state machines will not halt when 
started with a blank tape, but they do not enter the definition off . Every machine that does halt will 
execute a certain number of moves; of these, we take the largest to give f (n). 


Take any Turing machine M and positive number m. It is easy to modify M to produce M. in such 
a way that the latter will always halt with one of two answers: M applied to a blank tape halts in no 
more than m moves, or M applied to a blank tape makes more than m moves. All we have to do for 
this is to have M count its moves and terminate when this count exceeds m. Assume now that f (n) is 


computable by some Turing machine F . We can then put M. and F together as shown in Figure 12.4. 
First we compute f (|Q|), where Q is the state set of M . This tells us the maximum number of moves 


that M can make if it is to halt. The value we get is then used as m to construct M. as outlined, and a 


description of ME ig given to a universal Turing machine for execution. This tells us whether M 
applied to a blank tape halts or does not halt in less than f (Q|) steps. If we find that M applied to a 
blank tape makes more than f(|Q|) moves, then because of the definition of f, the implication is that M 
never halts. Thus we have a solution to the blank-tape halting problem. The impossibility of the 
conclusion forces us to accept that f is not computable. 


Figure 12.4 


Algorithm for the blank-tape halting problem. 
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EXERCISES 


1. Describe in detail how H in Theorem 12.1 can be modified to produce 7’. 


2. Suppose we change Definition 12.1 to require that qpwmwF gyw orgwuwk gw, 


depending on whether M applied to w halts or not. Reexamine the proof of Theorem 12.1 to show 
that this difference in the definition does not affect the proof in any significant way. 


3. Show that the following problem is undecidable. Given any Turing machine M, a € T, and we >", 
determine whether or not the symbol a is ever written when M is applied to w. 


4. In the general halting problem, we ask for an algorithm that gives the correct answer for any M and 
w. We can relax this generality, for example, by looking for an algorithm that works for all M but 
only a single w. We say that such a problem is decidable if for every w there exists a (possibly 


different) algorithm that determines whether or not (M,w) halts. Show that even in this restricted 
setting the problem is undecidable. 


5. Show that there is no algorithm to decide whether or not an arbitrary Turing machine halts on all 
input. 


6. Consider the question: “Does a Turing machine in the course of a computation revisit the starting 
cell (i.e., the cell under the read-write head at the beginning of the computation)?” Is this a 
decidable question? 


7. Show that there is no algorithm for deciding if any two Turing machines M, and M, accept the 
same language. 


8. How is the conclusion of Exercise 7 affected if M, is a finite automaton? 


9. Is the halting problem solvable for deterministic pushdown automata; that is, given a pda as in 
Definition 7.3, can we always predict whether or not the automaton will halt on input w? 


10. Let M be any Turing machine and x and y two possible instantaneous descriptions of it. Show that 
the problem of determining whether or not 
Par y 
is undecidable. 


11. In Example 12.3, give the values of f(1) and f (2). 


12. Show that the problem of determining whether a Turing machine halts on any input is 
undecidable. 


13. Let B be the set of all Turing machines that halt when started with a blank tape. Show that this 
set is recursively enumerable, but not recursive. 


14. Consider the set of all n-state Turing machines with tape alphabet T = {0,1,0}. Give an 
expression for m(n), the number of distinct Turing machines with this T. 


15. Let [= {0,1,0} and let b(n) be the maximum number of tape cells examined by any n-state 
Turing machine that halts when started with a blank tape. Show that b (n) is not computable. 


16. Determine whether or not the following statement is true: Any problem whose domain is finite 
is decidable. 


12.2 Undecidable Problems for Recursively Enumerable 
Languages 


We have determined that there is no membership algorithm for recursively enumerable languages. The 
lack of an algorithm to decide on some property is not an exceptional state of affairs for recursively 
enumerable languages, but rather is the general rule. As we now show, there is little we can say about 


these languages. Recursively enumerable languages are so general that, in essence, any question we 
ask about them is undecidable. Invariably, when we ask a question about recursively enumerable 
languages, we find that there is some way of reducing the halting problem to this question. We give 
here some examples to show how this is done and from these examples derive an indication of the 
general situation. 


Theorem 12.3 


Let G be an unrestricted grammar. Then the problem of determining whether or not 


L(G)=@ 


is undecidable. 


Proof: We will reduce the membership problem for recursively enumerable languages to this 
problem. Suppose we are given a Turing machine M and some string w. We can modify M as 
follows. M first saves its input on some special part of its tape. Then, whenever it enters a final state, 
it checks its saved input and accepts it if and only if it is w. We can do this by changing 6 in a simple 
way, creating for each w a machine /,,, such that 


L (M) = L (Mn {w}. 


Using Theorem 11.7, we then construct a corresponding grammar G,,. Clearly, the construction 
leading from M and w to G,, can always be done. Equally clear is that L (G,,) is nonempty if and only 
ifweL (M). 

Assume now that there exists an algorithm A for deciding whether or not L(G) = Ø. If we let T 
denote an algorithm by which we generate G,,, then we can put T and A together as shown in Figure 


12.5. Figure 12.5 is a Turing machine that for any M and w tells us whether or not w is in L (M). If 
such a Turing machine existed, we would have a membership algorithm for any recursively 
enumerable language, in direct contradiction to a previously established result. We conclude 
therefore that the stated problem “L (G) = Ø ” is not decidable. m 


Figure 12.5 
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Theorem 12.4 


Let M be any Turing machine. Then the question of whether or not L (M) is finite is undecidable. 


Proof: Consider the halting problem (M,w). From M we construct another Turing machine M. that 
does the following. First, the halting states of M are changed so that if any one is reached, all input is 


accepted by Bt This can be done by having any halting configuration go to a final state. Second, the 


original machine is modified so that the new Machine M. first generates w on its tape, then performs 
the same computations as M, using the newly created w and some otherwise unused space. In other 


words, the moves made by M. after it has written w on its tape are the same as would have been made 


by M had it started in the original configuration qow. If M halts in any configuration, then M. will halt 
in a final state. 


Therefore, if (M,w) halts,” will reach a final state for all input. If (M,w) does not halt, then M, 
will not halt either and so will accept nothing. In other words, De accepts either the infinite language 


ÈŁ* or the finite language Ø. 


If we now assume the existence of an algorithm A that tells us whether or not L om) is finite, we 
can construct a solution to the halting problem as shown in Figure 12.6. Therefore, no algorithm for 
deciding whether or not L (M) is finite can exist.= 


Figure 12.6 
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Notice that in the proof of Theorem 12.4, the specific nature of the question asked, namely “Is L 
(M) finite?”, is immaterial. We can change the nature of the problem without significantly affecting 
the argument. 


Example 12.4 


Show that for an arbitrary Turing machine M with È = {a, b}, the problem “L (M) contains two 
different strings of the same length” is undecidable. 


To show this, we use exactly the same approach as in Theorem 12.4, except that when M. reaches 
a halting configuration, it will be modified to accept the two strings a and b. For this, the initial input 
is saved and at the end of the computation compared with a and b, accepting only these two strings. 


Thus, if (M,w) halts, M. will accept two strings of equal length, otherwise M. will accept nothing. 
The rest of the argument then proceeds as in Theorem 12.4. 


In exactly the same manner, we can substitute other questions such as “Does L (M) contain any 
string of length five?” or “Is L (M) regular?” without affecting the argument essentially. These 
questions, as well as similar questions, are all undecidable. A general result formalizing this is 
known as Rice's theorem. This theorem states that any nontrivial property of a recursively 
enumerable language is undecidable. The adjective “nontrivial” refers to a property possessed by 
some but not all recursively enumerable languages. A precise statement and a proof of Rice's theorem 
can be found in Hopcroft and Ullman (1979). 


EXERCISES 


1. Show in detail how the machine M. in Theorem 12.4 is constructed. 
2. Show that the two problems mentioned at the end of the preceding section, namely 
(a) L (M) contains any string of length five, 
(b) L (M) is regular, 
are undecidable. 


3. LetM, and M, be arbitrary Turing machines. Show that the problem “L(/,) E (M3) ” is 
undecidable. 


4. Let G be any unrestricted grammar. Does there exist an algorithm for determining whether or not 
L(G)* is recursively enumerable? 


5. Let G be any unrestricted grammar. Does there exist an algorithm for determining whether or not 


L(G) = L(G)"? 


6. Let G, be any unrestricted grammar, and G, any regular grammar. Show that the problem 
L(G,)n L (Gz) =Ø 
is undecidable. 


7. Show that the question in Exercise 6 is undecidable for any fixed G, as long as L(G >) is not 
empty. 


8. For an unrestricted grammar G, show that the question “Is L(G) = L(G)*?” is undecidable. Argue 
(a) from Rice's theorem and (b) from first principles. 


12.3 The Post Correspondence Problem 


The undecidability of the halting problem has many consequences of practical interest, particularly in 
the area of context-free languages. But in many instances it is cumbersome to work with the halting 
problem directly, and it is convenient to establish some intermediate results that bridge the gap 
between the halting problem and other problems. These intermediate results follow from the 
undecidability of the halting problem, but are more closely related to the problems we want to study 
and therefore make the arguments easier. One such intermediate result is the Post correspondence 
proble m. 

The Post correspondence problem can be stated as follows. Given two sequences ofn strings on 
some alphabet X, say 


A = W1, Was... Wp 
and 
B= V1V25. -Vhs 


we say that there exists a Post correspondence solution (PC-solution) for pair (4,B) if there is a 
nonempty sequence of integers i,/,...,k, such that 


WiWj 0 Wg = VV] 0 Vk- 


The Post correspondence problem is to devise an algorithm that will tell us, for any (4, B), whether 
or not there exists a PC-solution. 


Example 12.5 


Let È = {0,1} and take A and B as 


wy — 11, ws — 100. ws = LEK. 
0 Eily =001, v= II: 


For this case, there exists a PC-solution as Figure 12.7 shows. 


Figure 12.7 


If we take 


= 00, wə = 001, w3 = 1000, 


p 


Vy = 0, v9 = 11, v3 =E 


there cannot be any PC-solution simply because any string composed of elements of A will be longer 
than the corresponding string from B. 


In specific instances we may be able to show by explicit construction that a pair ( A,B) permits a 
PC-solution, or we may be able to argue, as we did previously, that no such solution can exist. But in 
general, there is no algorithm for deciding this question under all circumstances. The Post 
correspondence problem is therefore undecidable. 


To show this is a somewhat lengthy process. For the sake of clarity, we break it into two parts. In 
the first part, we introduce the modified Post correspondence problem. We say that the pair (4,B) 
has a modified Post correspondence solution (MPC solution) if there exists a sequence of integers 
ij... k, Such that 


WWW). 7 «Wk E VyVivj- : Vie 


In the modified Post correspondence problem, the first elements of the sequences A and B play a 
special role. An MPC solution must start with w, on the left side and with v, on the right side. Note 


that if there exists an MPC solution, then there is also a PC solution, but the converse is not true. 


The modified Post correspondence problem is to devise an algorithm for deciding if an arbitrary 
pair (A,B) admits an MPC solution. This problem is also undecidable. We will demonstrate the 
undecidability of the modified Post correspondence problem by reducing a known undecidable 
problem, the membership problem for recursively enumerable languages, to it. To this end, we 
introduce the following construction. Suppose we are given an unrestricted grammar G = (V,7;S,P) 
and a target string w. With these, we create the pair (A, B) as shown in Figure 12.8. In Figure 12.8, 
the string FS > is to be taken as w, and the string F as vı. The order of the rest of the strings is 


immaterial. 


Figure 12.8 
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We want to claim eventually that w € L(G) if and only if the sets A and B constructed in this way 
have an MPC solution. Since this is perhaps not immediately obvious, let us illustrate it with a simple 
example. 


Example 12.6 
Let G =({4,B,C}, {a,b,c,},S,P) with productions 


S — aABb| Bbb, 
Bb — C. 


AC — aac. 


and take w =aaac. The sequences A and B obtained from the suggested construction are given in 
Figure 12.9. The string w = aaac is in L(G) and has a derivation 


S = aABb > aAC > aaac. 


How this derivation 1s paralleled by an MPC solution with the constructed sets can be seen in Figure 
12.10, where the first two steps in the derivation are shown. The integers above and below the 
derivation string show the indices for w and v, respectively, used to create the string. 


Examine Figure 12.10 carefully to see what is happening. We want to construct an MPC solution, 
so we must start with w,, that is, FS >. This string contains S, so to match it we have to use vj or V4}. 


In this instance, we use v10; this brings in w 9, leading us to the second string in the partial 
derivation. Looking at several more steps, we see that the string wıw;w;...is always longer than the 


corresponding string v,v,v;...and that the first is exactly one step ahead in the derivation. The only 
exception is the last step, where wo must be applied to let the v-string catch up. The complete MPC 


solution is shown in Figure 12.11. The construction, together with the example, indicates the lines 
along which the next result is established. 


Figure 12.9 
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Figure 12.10 


Figure 12.11 


Theorem 12.5 


LetG = (V,T,S,P) be any unrestricted grammar, withw any string in T". Let (4,B) be the 
correspondence pair constructed from G and w be the process exhibited in Figure 12.8. Then the pair 
(A, B) permits an MPC solution if and only if w € Z(G). 


Proof: The proof involves a formal inductive argument based on the outlined reasoning. We will omit 
the details.m= 


With this result, we can reduce the membership problem for recursively enumerable languages to 
the modified Post correspondence problem and thereby demonstrate the undecidability of the latter. 


Theorem 12.6 


The modified Post correspondence problem is undecidable. 


Proof: Given any unrestricted grammar G = (V,T,S,P ) and w € T*, we construct the sets A and B as 
suggested above. By Theorem 12.5, the pair (A, B)has an MPC solution if and only if w € L (G). 


Suppose now we assume that the modified Post correspondence problem is decidable. We can 
then construct an algorithm for the membership problem of G as sketched in Figure 12.12. An 
algorithm for constructing A from B from G and w clearly exists, but a membership algorithm for 


Figure 12.12 
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G and w does not. We must therefore conclude that there cannot be any algorithm for deciding the 
modified Post correspondence problem. m 


With this preliminary work, we are now ready to prove the Post correspondence problem in its 
original form. 


Theorem 12.7 


The Post correspondence problem is undecidable. 


Proof: We argue that if the Post correspondence problem were decidable, the modified Post 
correspondence problem would be decidable. 


Suppose we are given sequences A = Wy,W>...,w, and B = v),V>...,v, on some alphabet X. We then 
introduce new symbols 4 and $ and the new sequences 


C = Yo. Y1, +s Ynis 
D = 20, Zi; 03 2n+1; 


defined as follows. For i =1, 2,...n 
Yi = Wa hwiat--- Wim, h, 
zi = vir vied --- Viris 
where w; and v; denote the jth letter of w; and v; respectively, and m; = |wj\, r; = |v. In words, y; is 


created from w, by appending 1 to each character, while z; is obtained by prefixing each character of v; 
with4 To complete the definition of C and D, we take 


Yo = OY; 


Un+1 


Zi, 


He 
sts 


z0 


Žn+1 = 


Consider now the pair (C,D), and suppose it has a PC solution. Because of the placement of! and 3, 
such a solution must have yọ on the left and y, ,, on the right and so must look like 


Figure 12.13 
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Vp+1 ON the right and so must look like 


gwiihwia +: Gwyah--: hwr ehg = huriwr -e hvjih e huki e h8. 
Ignoring the characters 4 and 3 we see that this implies 
WIW... Wg = VIVi.. Vis 


so that the pair (A, B) permits an MPC solution. 


We can turn the argument around to show that if there is an MPC solution for (A,B) then there is a 
PC solution for the pair (C,D). 


Assume now that the Post correspondence problem is decidable. We can then construct the 
machine shown in Figure 12.13. This machine clearly decides the modified Post correspondence 
problem. But the modified Post correspondence problem is undecidable; consequently, we cannot 
have an algorithm for deciding the Post correspondence problem. m= 


EXERCISES 


1. Let A= {001, 0011,11,101} and B= {01, 111, 111, 010}. Does the pair (4,B) have a PC solution? 
Does it have an MPC solution? 


2. Provide the details of the proof of Theorem 12.5. 


3. Show that for [Z| = 1, the Post correspondence problem is decidable, that is, there is an algorithm 


that can decide whether or not (4,B) has a PC solution for any given (4,8) on a single-letter 
alphabet. 


4. Suppose we restrict the domain of the Post correspondence problem to include only alphabets 
with exactly two symbols. Is the resulting correspondence problem decidable? 


5. Show that the following modifications of the Post correspondence problem are undecidable. 
(a) There is an MPC solution if there is a sequence of integers such that w,w,...w,w) = 
ViVj- VVL 
(b) There is an MPC solution if there is a sequence of integers such that wpwywyw,...w, = 


V1V2V;Vj- ; VE 


6. The correspondence pair (A, B) is said to have an even PC solution if and only if there exists a 
nonempty sequence of even integers i,/,...k such that w,w;...w, = v;v;... Vg. Show that the problem 


of deciding whether or not an arbitrary pair (A,B) has an even PC solution is undecidable. 


12.4 Undecidable Problems for Context-Free Languages 


The Post correspondence problem is a convenient tool for studying undecidable questions for context- 
free languages. We illustrate this with a few selected results. 


Theorem 12.8 


There exists no algorithm for deciding whether any given context-free grammar is ambiguous. 


Proof: Consider two sequences of strings A = (Ww ,W,...,w, and B = (v1,V,...v,,)over some alphabet 
x. Choose a newset of distinct symbols @1,,a,..., a,, such that 


{@},49,...,4,} N È =Ø, 
and consider the two languages 


L4 = {WW;... WiWpagdy...ajaj} 


and 
Lg = {ViVj-- ViVpagay.-. jai} 
Now look at the context-free grammar 
G=({S, S4Sp},2 U {1,,d9,...a,},P,S), 
where the set of productions P is the union of the two subsets: The first set P4 consists of 


S — Sa, 


DA => Wid Ali Wili, z = | ot KER 474 


while the second set Pg has the productions 


ak Be 
Sansha i= Aan: 
Take 
Gy=(C{S,S4}52% U{a1,å7,.. ap} P 45) 
and 
Ga=({S,Sp} > U{a),a,...dy}5Pp,5S) 
then clearly 
La = L (Ga), 
Lp = L(Gp), 
and 
L(G) =L,ULp 


It is easy to see that G, and Gg by themselves are unambiguous. If a given string in Z(G) ends 
with q, then its derivation with grammar G, must have started with S > w,Sa;. Similarly, we can tell 


at any later stage which rule has to be applied. Thus, if G is ambiguous it must be because there is a w 
for which there are two derivations 


y n ‘ + 
D >> DA > WiSAaagi > WiWj ++ WkaAk' + Ajai = W 
and 
S => Sp > Við Bai > viv; --- Vkak ` Ajai = W. 


Consequently, if G is ambiguous, then the Post correspondence problem with the pair (A, B)has a 


solution. Conversely, if G is unambiguous, then the Post correspondence problem cannot have a 
solution. 


If there existed an algorithm for solving the ambiguity problem, we could adapt it to solve the 
Post correspondence problem as shown in Figure 12.14. But since there is no algorithm for the Post 
correspondence problem, we conclude that the ambiguity problem is undecidable. m= 


Figure 12.14 
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Theorem 12.9 


There exists no algorithm for deciding whether or not 


Gis not ambiguous 


L(G) n. L(G) =Ø 


for arbitrary context-free grammars G, and G}. 


Proof: Take as G, the grammar G, and as G, the grammar Gp as defined in the proof of Theorem 
12.8. Suppose that L (G,) and L (Gz ) have a common element, that is, 


-~ + 
DA È WiWj +++ Wap: ++ ajay 


and 


Sp S UiUj +s URAk +++ Ajai. 
Then the pair (4, B) has a PC solution. Conversely, if the pair does not have a PC solution, then 
L(G,) and L (Gg) cannot have a common element. We conclude that L (G, )n L(G) is nonempty if 


and only if (A,B) has a PC solution. This reduction proves the theorem. m 
eT 


There is a variety of other known results along these lines. Some of them can be reduced to the 
Post correspondence problem, while others are more easily solved by establishing different 
intermediate results first (see, for example, Exercises 6 and 7 at the end of this section). We will not 
give the arguments here, but point to some additional results in the exercises. 


That there are many undecidable problems connected with context-free languages seems 
surprising at first and shows that there are limitations to computations in an area in which we might 


be tempted to try an algorithmic approach. For example, it would be helpful if we could tell if a 
programming language defined in BNF is ambiguous, or if two different specifications of a language 
are in fact equivalent. But the results that have been established tell us that this is not possible, and it 
would be a waste of time to look for an algorithm for either of these tasks. Keep in mind that this does 
not rule out the possibility that there may be ways of getting the answer for specific cases or perhaps 
even most interesting ones. What the undecidability results tell us is that there is no completely 
general algorithm and that no matter how Many different cases a method can handle, there are 
invariably some situations for which it will break down. 


EXERCISES 


1. Prove the claim made in Theorem 12.8 that G, and Gg by themselves are unambiguous. 
“2. Show that the problem of determining whether or not 
L(G) E L(G») 
is undecidable for context-free grammars G,,G). 


“3. Show that for arbitrary context-free grammars G, and G,, the problem “Z(G,) n L (G) is 
context-free” 1s undecidable. 


“4.Show that if the language L (G4) n L(Gg) in Theorem 12.8 is regular, then it must be empty. Use 
this to show that the problem “Z (G) is regular” is undecidable for context-free G. 


“5. Let L, be a regular language and G a context-free grammar. Show that the problem “L, E L(G)” 
is undecidable. 


“6.Let M be any Turing machine. We can assume without loss of generality that every computation 
involves an even number of moves. For any such computation 


qow H x1 H z2 H -+ H an, 
we can then construct the string 
qow K rË H gaH rg H-H an. 
This is called a valid computation. 
Show that for every M we can construct three context-free grammars G4, G>,G3, such that 


(a) the set of all valid computations is L (G,) L (G>), and 


(b) the set of all invalid computations (that is, the complement of the set of valid computations) 


is L (G3). 


Use the results to show that “L(G) = x*” is undecidable over the domain of all context-free 
grammars G. 


“7. Let G} be a context-free grammar and G, a regular grammar. Is the problem 
L(G)n L(G) =O 
decidable? 
“8. Let G, and G, be grammars with G; regular. Is the problem 
L(G,)=L(G)) 
decidable when 
(a) G 5 is unrestricted, 


(b) when G 5 is context-free, 


(c) when G > is regular? 


12.5 A Question of Efficiency 


As long as we are concerned only with computability or decidability, it makes little difference what 
model of Turing machine we use. But when we start looking at possible practical concerns, such as 
ease of implementation or efficiency, significant distinctions appear quickly. Here are two examples 
that give us a first look at these issues. 


Example 12.7 


In Example 9.7 we constructed a single-tape Turing machine for the language 
L={a"b" : n> 1}. 


A look at that algorithm will show that for w = a”b” it takes roughly 2n steps to match each a with the 
corresponding b. Therefore, the whole computation takes O (n?) moves. 

But, as we later indicated in Example 10.1, with a two-tape machine we can use a different 
algorithm. We first copy all the a's to the second tape, then match them against the b's on the first. The 
situation before and after the copying is shown in Figure 12.15. Both the copying and the matching can 
be done in O(n) moves and is therefore much more efficient. 


Figure 12.15 


(a) Initial tapes (b) Tapes after copying of a's 


Example 12.8 


In Sections 5.2 and 6.3 we discussed the membership problem for context-free languages. If we take 
the length of the input string w as the problem size n, then the exhaustive search takes O (n™) steps, 
where M depends on the grammar. The more efficient CYK algorithm requires an amount of work 


O(n>). Both of these algorithms are deterministic. 


A nondeterministic algorithm for this problem proceeds by simply guessing which sequence of 
productions is applied in the derivation ofw. If we work with a grammar that has no unit- or à- 
productions, the length of the derivation is essentially |w|, so we have an O (n) algorithm. 


These examples suggest that efficiency questions are affected by the type of Turing machine we 
use and that the issue of determinism versus nondeterminism is a particularly crucial one. We will 
look at this in more detail in Chapter 14. 


EXERCISES 


1. Consider the language 
L= {ww: we {a,b}"}. 
Discuss the construction and efficiency of algorithms for accepting L on 


(a) a standard Turing machine, 

(b) ona two-tape deterministic Turing machine, 

(c) ona single-tape nondeterministic Turing machine, 
(d) on a two-tape nondeterministic Turing machine. 


2. Repeat Exercise 1 for 


L= {www : wefa,b}"}. 


Chapter 13 


Other Models of 
Computation 


Ithough Turing machines are the most general models of computation we can construct, they 
i are not the only ones. At various times, other models have been proposed, some of which 


at first glance seemed to be radically different from Turing machines. Eventually, however, 
all the models were found to be equivalent. Much of the pioneering work in this area was 
done in the period between 1930 and 1940 and a number of mathematicians, A. M. Turing 
among them, contributed to it. The results that were found shed light not only on the concept of a 
mechanical computation, but on mathematics as a whole. 


Turing's work was published in 1936. No commercial computers were available at that time. In 
fact, the whole idea had been considered only in a very peripheral way. Although Turing's ideas 
eventually became very important in computer science, his original goal was not to provide a 
foundation for the study of digital computers. To understand what Turing was trying to do, we must 
briefly look at the state of mathematics at that time. 


With the discovery of differential and integral calculus by Newton and Leibniz in the seventeenth 
and eighteenth centuries, interest in mathematics increased and the discipline entered an era of 
explosive growth. A number of different areas were studied, and significant advances were made in 
almost all of them. By the end of the nineteenth century, the body of mathematical knowledge had 
become quite large. Mathematicians also had become sufficiently sophisticated to recognize that some 
logical difficulties had arisen that required a more careful approach. This led to a concern with rigor 
in reasoning and a consequent examination of the foundations of mathematical knowledge in the 
process. To see why this was necessary, consider what is involved in a typical proof in just about 
every book and paper dealing with mathematical subjects. A sequence of plausible claims is made, 
interspersed with phrases like “it can be seen easily” and “it follows from this.” Such phrases are 
conventional, and what one means by them is that, if challenged to do so, one could give more 
detailed reasoning. Of course, this is very dangerous, since it is possible to overlook things, use 
faulty hidden assumptions, or make wrong inferences. Whenever we see arguments like this, we 
cannot help but wonder if the proof we are given is indeed correct. Often there is no way of telling, 
and long and involved proofs have been published and found erroneous only after a considerable 
amount of time. Because of practical limitations, however, this type of reasoning is accepted by most 
mathematicians. The arguments throw light on the subject and at least increase our confidence that the 
result is true. But to those demanding complete reliability, they are unacceptable. 


One alternative to such “sloppy” mathematics is to formalize as far as possible. We start with a 
set of assumed givens, called axioms, and precisely defined rules for logical inference and deduction. 
The rules are used in a sequence of steps, each of which takes us from one proven fact to another. The 
rules must be such that the correctness of their application can be checked in a routine and completely 


mechanical way. A proposition is considered proven true if we can derive it from the axioms in a 
finite sequence of logical steps. If the proposition conflicts with another proposition that can be 
proved to be true, then it is considered false. 


Finding such formal systems was a major goal of mathematics at the end of the nineteenth century. 
Two concerns immediately arose. The first was that the system should be consistent. By this we 
mean that there should not be any proposition that can be proved to be true by one sequence of steps, 
then shown to be false by another equally valid argument. Consistency is indispensable in 
mathematics, and anything derived from an inconsistent system would be contrary to all we agree on. 
A second concern was whether a system is complete, by which we mean that any proposition 
expressible in the system can be proved to be true or false. For some time it was hoped that consistent 
and complete systems for all of mathematics could be devised thereby opening the door to rigorous 
but completely mechanical theorem proving. But this hope was dashed by the work of K.Gédel. In his 
famous incompleteness theorem, Gödel showed that any interesting consistent system must be 
incomplete; that is, it must contain some unprovable propositions. Gédel's revolutionary conclusion 
was published in 1931. 


Godel's work left unanswered the question of whether the unprovable statements could somehow 
be distinguished from the provable ones, so that there was still some hope that most of mathematics 
could be made precise with mechanically verifiable proofs. It was this problem that Turing and other 
mathematicians of the time, particularly A. Church, S. C. Kleene, and E. Post, addressed. In order to 
study the question, a variety of formal models of computation were established. Prominent among 
them were the recursive functions of Church and Kleene and Post systems, but there are many other 
such systems that have been studied. In this chapter we briefly review some of the ideas that arose out 
of these studies. There is a wealth of material here that we cannot cover. We will give only a very 
brief presentation, referring the reader to other references for detail. A quite accessible account of 
recursive functions and Post systems can be found in Denning, Dennis, and Qualitz (1978), while a 
good discussion of various other rewriting systems is given in Salomaa (1973) and Salomaa (1985). 


The models of computation we study here, as well as others that have been proposed, have 
diverse origins. But it was eventually found that they were all equivalent in their power to carry out 
computations. The spirit of this observation is generally called Church's thesis. This thesis states 
that all possible models of computation, if they are sufficiently broad, must be equivalent. It also 
implies that there is an inherent limitation in this and that there are functions that cannot be expressed 
in any way that gives an explicit method for their computation. The claim is of course very closely 
related to Turing's thesis, and the combined notion is sometimes called the Church-Turing thesis. It 
provides a general principle for algorithmic computation and, while not provable, gives strong 
evidence that no more powerful models can be found. 


13.1 Recursive Functions 


The concept of a function is fundamental to much of mathematics. As summarized in Section 1.1, a 
function is a rule that assigns to an element of one set, called the domain of the function, a unique 
value in another set, called the range of the function. This is very broad and general and immediately 
raises the question of how we can explicitly represent this association. There are many ways in which 
functions can be defined. Some of them we use frequently, while others are less common. 


We are all familiar with functional notation in which we write expressions like 
fin) =n? +1. 
This defines the function f by means of a recipe for its computation: Given any value for the 
argument n, multiply that value by itself, and then add one. Since the function is defined in this explicit 
way, we can compute its values in a strictly mechanical fashion. To complete the definition of f, we 


also must specify its domain. If, for example, we take the domain to be the set of all integers, then the 
range of f will be some subset of the set of positive integers. 


Since many very complicated functions can be specified this way, we may well ask to what extent 
the notation is universal. If a function is defined (that is, we know the relation between the elements 
of its domain and its range), can it be expressed in such a functional form? To answer the question, 
we must first clarify what the permissible forms are. For this we introduce some basic functions, 
together with rules for building from them some more complicated ones. 


Primitive Recursive Functions 

To keep the discussion simple, we will consider only functions of one or two variables, whose 
domain is either /, the set of all nonnegative integers, or J x J, and whose range is in 7. In this setting, 
we Start with the basic functions: 


1. The zero function z(x) = 0, for all x € J. 


2. The successor function s(x), whose value is the integer next in sequence to x, that is, in the usual 
notation, s(x) =x +1. 


3. The projector functions 
Py &1 X2) Hy.  k=1,2. 

There are two ways of building more complicated functions from these: 

1. Composition, by which we construct 

Sœ, y) = h (81 ( y), 8&2 (% y)) 

from defined functions g1,27,⁄. 

2. Primitive recursion, by which a function can be defined recursively through 

f œ, 0) = 8; ©), 


fay “Ie 1) = (22 (x, y), J (x, y)), 


from defined functions g4, g2, and A. 


We illustrate how this works by showing how the basic operations of integer arithmetic can be 
constructed in this fashion. 


Example 13.1 


Addition of integers x and y can be implemented with the function add (x, y), defined by 
add ( x, 0) =x, 
add ( x, y +1) = add ( x, y)+1. 
To add 2 and 3, we apply these rules successively: 
add (3, 2) = add (3,1) + 1 
= (add (3,0) + 1) +1 
= (341) +1 


=44+1=5. 


Example 13.2 


Using the add function defined in Example 13.1, we can now define multiplication by 
mult (x, 0) = 0, 
mult (x, y + 1) = add (x, mult (x, y)). 


Formally, the second step is an application of primitive recursion, in which A is identified with the 
add function, and g, (x, y) is the projector function p; (x, y). 


Example 13.3 


Substraction is not quite so obvious. First, we must define it, taking into account that negative 
numbers are not permitted in our system. A kind of subtraction is defined from usual subtraction by 


x y=x-yifx>y, 


x y=Oifx<y. 


The operator = is sometimes called the monus; it defines subtraction so that its range is /. 
Now we define the predecessor function 


pred (0) = 0, 
pred Q +1) =y, 
and from it, the subtracting function 
subtr (x, 0) =x, 
subtr (x, y +1) = pred (subtr (x, y)). 
To prove that 5 — 3 = 2, we reduce the proposition by applying the definitions a number of times: 


subtr (5,3) = pred (subtr (5, 2)) 


pred (pred (subtr (5,1))) 


pred (pred (pred (subtr (5,0)))) 
= pred (pred (pred (5))) 

= pred (pred (4)) 

= pred (3) 
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par 


In much the same way, we can define integer division, but we will leave the demonstration of it as 
an exercise. If we accept this as given, we see that the basic arithmetic operations are all 
constructible by the elementary processes described. With the algebraic operations precisely defined, 
other more complicated ones can now be constructed, and very complex computations built from the 
simple ones. We call functions that can be constructed in such a manner primitive recursive. 


Definition 13.1 


A function is called primitive recursive if and only if it can be constructed from the basic functions z, 
S, Py, by successive composition and primitive recursion. 


Note that if g4, g2, and h are total functions, then f defined by composition and primitive recursion 
is also a total function. It follows from this that every primitive recursive function is a total function 
on/ or J xI. 


The expressive power of primitive recursive functions is considerable, and most common 
functions are primitive recursive. However, not all functions are in this class, as the following 
argument shows. 


Theorem 13.1 


Let F denote the set of all functions from/ to /. Then there is some function in F that is not primitive 
recursive. 


Proof: Every primitive recursive function can be described by a finite string that indicates how it is 
defined. Such strings can be encoded and arranged in standard order. Therefore, the set of all 
primitive recursive functions is countable. 


Suppose now that the set of all functions is also countable. We can then write all functions in 
some order, say, f;,/5,.... We next construct a function g defined as 


e(i)=f(i)t1,  i=1,2,.... 


Clearly, g is well defined and is therefore in F, but equally clearly, g differs from every f; in the 
diagonal position. This contradiction proves that F cannot be countable. 

Combining these two observations proves that there must be some function in F that is not 
primitive recursive. m 


Actually, this goes even further; not only are there functions that are not primitive recursive, there 
are in fact computable functions that are not primitive recursive. 


Theorem 13.2 


Let C be the set of all total computable functions from 7 to Z. Then there is some function in C that is 
not primitive recursive. 


Proof: By the argument of the previous theorem, the set of all primitive recursive functions is 
countable. Let us denote the functions in this set as r),r5,...and define a function g by 


g(i) =r(i)+1 
By construction, the function g differs from every r; and is therefore not primitive recursive. But 
clearly g is computable, proving the theorem. m 

I] 


The nonconstructive proof that there are computable functions that are not primitive recursive is a 
fairly simple exercise in diagonalization. The actual construction of an example of such a function is a 
much more complicated matter. We will give here one example that looks quite simple; however, the 
demonstration that it is not primitive recursive is quite lengthy. 


Ackermann's Function 


Ackermann's function is a function from J x J to J, defined by 
A (Oy) = y +1, 
A(x, 0) = A(x-1,1), 
A(x,y+l) = A(x-1,A(Q, y)). 


It is not hard to see that A is a total, computable function. In fact, it is quite elementary to write a 
recursive computer program for its computation. But in spite of its apparent simplicity, Ackermann's 
function is not primitive recursive. 


Of course, we cannot argue directly from the definition of A. Even though this definition is not in 
the form required for a primitive recursive function, it is possible that an appropriate alternative 
definition could exist. The situation here is similar to the one we encountered when we tried to prove 
that a language was not regular or not context-free. We need to appeal to some general property of the 
class of all primitive recursive functions and show that Ackermann's function violates this property. 
For primitive recursive functions, one such property is the growth rate. There is a limit to how fast a 
primitive recursive function f(n) can grow as n — oo, and Ackermann's function violates this limit. 
That Ackermann's function grows very rapidly is easily demonstrated; see, for example, Exercises 9 
to 11 at the end of this section. How this is related to the limit of growth for primitive recursive 
functions is made precise in the following theorem. Its proof, which is tedious and technical, will be 
omitted. 


Theorem 13.3 


Let f be any primitive recursive function. Then there exists some integer n such that 


JG) <A (ni), 


for alli =n, n +1,.... 
Proof: For the details of the argument, see Denning, Dennis, and Qualitz (1978, p. 534). m 


If we accept this result, it follows easily that Ackermann's function is not primitive recursive. 


Theorem 13.4 


Ackermann's function is not primitive recursive. 
Proof: Consider the function 


gi) =A (i, i). 


If A were primitive recursive, then so would g. But then, according to Theorem13.3, there exists ann 
such that 


gi) < A (n, i), 
for all i. If we now pick i =n, we get the contradiction 
g (n) = A(n, n) 
< A(n, n), 


proving that A cannot be primitive recursive.m 


u Recursive Functions 


To extend the idea of recursive functions to cover Ackermann's function and other computable 
functions, we must add something to the rules by which such functions can be constructed. One way is 
to introduce the u or minimalization operator, defined by 


Ly (e (x, y)) = smallest y such that g (x, y) = 0. 


In this definition, we assume that g is a total function. 
Example 13.4 


Let 
g (xy) =xty 3, 
which is a total function. If x < 3, then 
y=3-x 
is the result of the minimalization, but if x > 3, then there is no y € J such that x + y — 3 = 0. Therefore, 
HYV(g (x, y))=3-x, forxs3, 
= undefined, for x > 3. 


We see from this that even though g (x,y) is a total function, wy(g (x,y)) may only be partial. 


As the previous example shows, the minimalization operation opens the possibility of defining 
partial functions recursively. But it turns out that it also extends the power to define total functions so 
as to include all computable functions. Again, we merely quote the major result with references to the 
literature where the details may be found. 


Definition 13.2 


A function is said to be u-recursive if it can be constructed from the basis functions by a sequence of 
applications of the “-operator and the operations of composition and primitive recursion. 


Theorem 13.5 


A function is “-recursive if and only if it is computable. 
Proof: For a proof, see Denning, Dennis, and Qualitz (1978, Chapter 13). m 


The u-recursive functions therefore give us another model for algorithmic computation. 


EXERCISES 
1. Use the definitions in Examples 13.1 and 13.2 to prove that 3 +4=7 and 2 *3 =6. 
2. Define the function 
greater(x,y) =1lifx>y, 
=(Oifx<y. 


Show that this function is primitive recursive. 


3. Consider the function 
equals (x,y)=1 ifx=y, 
=0 ifx fy. 


Show that this function is primitive recursive. 
4. Let fbe defined by 


Sx, y)= x ifx=y, 
= 0 ifx=y. 
Show that this function is primitive recursive. 


“5. Integer division can be defined by two functions div and rem: 


div (x, y) =n, 
where n is the largest integer such that x > ny, and 
rem (x, y) =X — ny. 


Show that the functions div and rem are primitive recursive. 


6. Show that 
fin) = 2" 
is primitive recursive. 
7. Show that the function 
g (x,y) =” 


is primitive recursive. 


8. Write a computer program for computing Ackermann's function. Use it to evaluate A (2, 5) and A 
(3, 3). 


9. Prove the following for the Ackermann function. 
(a) AC, y)=yt2. 
(b) A (2, y) =2y +3. 
(c) A (3, y) =2"3 -3. 
10. Use Exercise 9 to compute A (4,1) and A (4, 2). 
11. Give a general expression for A (4,y). 
12. Show the sequence of recursive calls in the computation of A (5, 2). 
13. Show that Ackermann's function is a total function in 7 x J. 


14. Try to use the program constructed for Exercise 8 to evaluate A (5, 5). Can you explain what you 
observe? 


15. For each g below, compute wy(g (x,y)), and determine its domain. 
(a) g (x y) =X. 
(b) g œ y)=2 +y—3. 
(c) g(x, y) = integer part of (x — 1) / @ +1). 
(d) g(x,y) =x mod(y + 1). 


16. The definition of pred in Example 13.3, although intuitively clear, does not strictly adhere to the 
definition of a primitive recursive function. Show how the definition can be rewritten so that it 
has the correct form. 


13.2 Post Systems 


A Post system looks very much like an unrestricted grammar consisting of an alphabet and some 
production rules by which successive strings can be derived. But there are significant differences in 
the way in which the productions are applied. 


Definition 13.3 


A Post system II is defined by 
Il =(CV,A,P), 


where 
C is a finite set of constants, consisting of two disjoint sets Cy, called the nonterminal 
constants, and C7, the set of terminal constants, 


V isa finite set of variables, 
A 1s a finite set from C*, called the axioms, 
P isa finite set of productions. 


The productions in a Post system must satisfy certain restrictions. They must be of the form 
TiVi- VaTn+ aii yıWıy2 NS WinYm+41; (13.1) 


where x;, y; € C*, and V;, W; € V, subject to the requirement that any variable can appear at most once 
on the left, so that 


Vi#V, for iz), 


and that each variable on the right must appear on the left, that is, 


Suppose we have a string of terminals of the formx,w)x,w ...w,X,+4;, Where the substrings x,, 
X>...match the corresponding strings in (13.1) and w; € C*. We can then make the identification w; = 
Vi, w2 = V>,..., and substitute these values for the W's on the right of (13.1). Since every W is some V; 
that occurs on the left, it is assigned a unique value, and we get the new string y;wy2W;...Vini7. We 
write this as 


X Wx Wa.. Xps > VW W;-»- Vine: 


As for a grammar, we can now talk about the language derived by a Post system. 


Definition 13.4 


The language generated by the Post system II = (C, V, A, P) is 


LAT) = {w E C7 : wo = w for some wo € A} : 


Example 13.5 


Consider the Post system with 


Cr = {a,b}, 
Cy = Ø, 
VR {M5 
A= ih}, 
and production 
V; — a Vb. 
This allows the derivation 
à > ab > aabb 


In the first step, we apply (13.1) with the identification x; =A, V1 = À, x, =A, y; =a, W, = V}, and y, = 
b. In the second step, we re-identify V, = ab, leaving everything else the same. If you continue with 
this, you will quickly convince yourself that the language generated by this particular Post system is 


{a"b? :n=0}. 
Example 13.6 
Consider the Post system with 
Cr = {1, T =}, 


Cy = Ø, 


Vm Vig Von Vass 
A = {1+1=11}, 
and productions 
V+ V,=V3—24V,1+V,= 731, 
V th= —> V,+V21 = V3 1. 
The system allows the derivation 
1+1=11l511+1=111 
>11+11=1111. 
Interpreting the strings of 1's as unary representations of integers, the derivation can be written as 
1+1=2352+4+1=33524+2=4. 


The language generated by this Post system is the set of all identities of integer additions, such as 2 + 
2 = 4, derived from the axiom] + 1 = 2. 


Example 13.6 illustrates in a simple manner the original intent of Post systems as a mechanism for 
rigorously proving mathematical statements from a set of axioms. It also shows the inherent 
awkwardness of such a completely rigorous approach and why it is rarely used. But Post systems, 
even though they are cumbersome for proving complicated theorems, are general models for 
computation, as the next theorem shows. 


Theorem 13.6 


A language is recursively enumerable if and only if there exists some Post system that generates it. 


Proof: The arguments here are relatively simple and we sketch them briefly. First, since a derivation 
by a Post system is completely mechanical, it can be carried out on a Turing machine. Therefore, any 
language generated by a Post system is recursively enumerable. 


For the converse, remember that any recursively enumerable language is generated by some 
unrestricted grammar G, having productions all of the form 


xX—y), 


withx, y e (V UT)*. Given any unrestricted grammar G, we create a Post system II = (Vjj,C,4,Py)), 
where Vy = {V1, Vo} Cy = V, Cr=T, A= {S}, and with productions 


Vigo > Vigo 


for every production x — y of the grammar. It is then an easy matter to show that a w can be generated 
by the Post system II if and only if it is in the language generated by G. m 


EXERCISES 


1. For È = {a,b,c}, find a Post system that generates the following languages. 
(a) L (a*b + ab*c). 
(b) L = {ww}. 
(c) L = {a"b"c"}. 


2. Find a Post system that generates 
L= {wk : we{a, b}*} 


3. For & = {a}, what language does the Post system with axiom {a} and the following production 
generate? 


Vi —_ ViVi. 


4. What language does the Post system in Exercise 3 generate if the axiom set is { a, ab} ? 


5. Find a Post system for proving the identities of integer multiplication using unary notation and 
starting from the axiom | * 1 = 1. 


6. Give the details of the proof of Theorem 13.6. 
7. What language does the Post system with 


V — aVV 


and axiom set {ab} generate? 


8. A restricted Post system is one on which every production x — y satisfies, in addition to the usual 
requirements, the further restriction that the number of variable occurrences on the right and left is 
the same, i.e., n =m in (13.1). Show that for every language L generated by some Post system, 
there exists a restricted Post system to generate L. 


13.3 Rewriting Systems 


The various grammars we have studied have a number of things in common with Post systems: They 
are all based on an alphabet in which strings are written, and some rules by which one string can be 
obtained from another. Even a Turing machine can be viewed this way, since its instantaneous 
description is a string that completely defines its configuration. The program is then just a set of rules 


for producing one such string from a previous one. These observations can be formalized in the 
concept of a rewriting system. Generally, a rewriting system consists of an alphabet X and a set of 


rules or productions by which a string in X* can produce another. What distinguishes one rewriting 
system from another is the nature of È and restrictions for the application of the productions. 


The idea is quite broad and allows any number of specific cases in addition to the ones we have 
already encountered. Here we briefly introduce some less well-known ones that are interesting and 
also provide general models for computation. For details, see Salomaa (1973) and Salomaa (1985). 


Matrix Grammars 
Matrix grammars differ from the grammars we have previously studied (which are often called 


phrase-structure grammars) in how the productions can be applied. For matrix grammars, the set of 
productions consists of subsets P,,P>,...,P,,, each of which is an ordered sequence 


X1 7 Yis Xa > VY... 


Whenever the first production of some set P, is applied, we must next apply the second one to the 
string just created, then the third one, and so on. We cannot apply the first production of P ; unless all 
other productions in this set can also be applied. 


Example 13.7 
Consider the matrix grammar 
P,:S—S,85, 
P, : S4 > aS), S2 > bS5C, 
P3 : Si > à, Sy >À. 
A derivation with this grammar is 
S > SiS > aS bsc > aaS,bbS5cc > aabbcc. 


Note that whenever the first rule of P, is used to create ana, the second one also has to be used, 


producing a corresponding b and c. This makes it easy to see that the set of terminal strings generated 
by this matrix grammar is 


L= {a"b"c": n> 0}. 


Matrix grammars contain phrase-structure grammars as a special case in which each P, contains 


exactly one production. Also, since matrix grammars represent algorithmic processes, they are 
governed by Church’s thesis. We conclude from this that matrix grammars and phrase-structure 
grammars have the same power as models of computation. But, as Example 13.7 shows, sometimes 
the use of a matrix grammar gives a much simpler solution than we can achieve with an unrestricted 
phrase-structure grammar. 


Markov Algorithms 
A Markov algorithm is a rewriting system whose productions 

x—y 
are considered ordered. In a derivation, the first applicable production must be used. Furthermore, 
the leftmost occurrence of the substring x must be replaced byy. Some of the productions may be 
singled out as terminal productions; they will be shown as 


x —>. y. 


A derivation starts with some string w € È and continues either until a terminal production is used or 
until there are no applicable productions. 


For language acceptance, a set 7 G È of terminals is identified. Starting with a terminal string, 
productions are applied until the empty string is produced. 


Definition 13.5 
Let M be a Markov algorithm with alphabet £ and terminals T. Then the set 
L(M)={weT*:w4a} 


is the language accepted by M. 


Example 13.8 


Consider the Markov algorithm with £ = T= {a, b} and productions 
ab — i, 
ba— À. 


Every step in the derivation annihilates a substring ab or ba, so 


L(M) = {w €{a, b}* : n4 (W) = n (W)}. 


Example 13.9 


Find a Markov algorithm for 
L= {a"b”": n> 0}. 
An answer is 
ab — S, 
aSb —> S, 
S— À. 


Ifin this last example we take the first two productions and reverse the left and right sides, we get 
a context-free grammar that generates the language L. In a certain sense, Markov algorithms are 
simply phrase-structure grammars working backward. This cannot be taken too literally, since it is 
not clear what to do with the last production. But the observation does provide a starting point for a 
proof of the following theorem that characterizes the power of Markov algorithms. 


Theorem 13.7 


A language is recursively enumerable if and only if there exists a Markov algorithm for it. 
Proof: See Salomaa (1985, p. 35). m 


L-Systems 


The origins of L-systems are quite different from what we might expect. Their developer, A. 
Lindenmayer, used them to model the growth pattern of certain organisms. L-systems are essentially 
parallel rewriting systems. By this we mean that in each step of a derivation, every symbol has to be 
rewritten. For this to make sense, the productions of an L-system must be of the form 


a — u, (13.2) 
where a € È and u € &*. When a string is rewritten, one such production must be applied to every 
symbol of the string before the new string is generated. 


Example 13.10 


Let È = {a} and 
a—aa 
define an L-system. Starting from the string a, we can make the derivation 
a > aa > aaaa > aaaaaaaa. 


The set of strings so derived is clearly 


L={a™ :n>0}. 


It is known that L-systems with productions of the form (13.2) are not sufficiently general to 
provide for all algorithmic computations. An extension of the idea provides the necessary 
generalization. In an extended L-system, productions are of the form 

(x, a, y) > u, 
where a € È and x,y, u € X*, with the interpretation that a can be replaced by u only if it occurs as 


part of the string xay. It is known that such extended L-systems are general models of computation. 
For details, see Salomaa (1985). 


EXERCISES 


1. Find a matrix grammar for 
L={ww:we{a, b}*}. 

2. What language is generated by the matrix grammar 

P} : S — SiS, 

P, : S; — aS\b,S, — bS,a, 

P3 : Si — A, S A? 
3. Suppose that in Example 13.7 we change the last group of productions to 

P3 : S > à, S, > S. 


What language is generated by this matrix grammar? 


4. Why does the Markov algorithm in Example 13.9 not accept abab? 
5. Find a Markov algorithm that derives the language L = {a”b"c": n> 1}. 


*6. Find a Markov algorithm that accepts 
L= {a"b™a™: n>1,m=> 1}. 


7. Find an L-system that generates L(aa*). 
8. What is the set of strings generated by the L-system with productions 


when started with the string a? 


Chapter 14 


An Overview of 
Computational 
Complexity 


e now reconsider computational complexity, the study of the efficiency of algorithms. 
Complexity, briefly mentioned in Chapter 11, complements computability by separating 
W problems that can be solved in practice from those that can be solved only in principle. 
In studying complexity, it is necessary to ignore many details, such as the particulars of 
hardware, software, data structures, and implementation, and look at the common, 
damental issues. For this reason, we work mostly with orders-of-magnitude expressions. But, as 
we will see, even such a very high-level view yields very useful results. 


Efficiency is measured by resource requirements, such as time and space, so we can talk about 
time-complexity and space-complexity. Here we will limit ourselves to time-complexity, which is a 
rough measure of the time taken by a particular computation. There are many results for space- 
complexity as well, but time-complexity is a little more accessible and, at the same time, more useful. 


Computational complexity is an extensive topic, most of which is well beyond the scope of this 
text. There are some results, however, that are simply stated and easily appreciated, and that throw 
further light on the nature of languages and computation. In this chapter, we present a brief overview 
of the most salient results in complexity. Many proofs are difficult and we will dispense with them by 
reference to appropriate sources. Our intent here is to present the flavor of the subject matter without 
getting bogged down in the details. For this reason, we will allow ourselves a great deal of latitude, 
both in the selection of topics and in the formality of the discussion. 


14.1 Efficiency of Computation 


Let us start with a concrete example. Given a list of one thousand integers, we want to sort them in 
some way, say in ascending order. Sorting is a simple problem but also one that is very fundamental 
in computer science. If we now ask the question, “How long will it take to do this task?” we see 
immediately that much more information is needed before we can answer it. Clearly, the number of 
items in the list plays an important role in how much time will be taken, but there are other factors. 
There is the question of what computer we use and how we write the program. Also, there are a 
number of sorting methods so that selection of the algorithm is important. There are probably a few 
more things you can think of that need to be looked at before you can even make a rough guess of the 
time requirements. If we have any hope of producing a general picture of sorting, most of these issues 
have to be ignored, and we must concentrate on those that are fundamental. 


For our discussion of computational complexity, we will make the following simplifying 


assumptions. 


1. The model for our study will be a Turing machine. The exact type of Turing machine to be used 
will be discussed below. 


2. The size of the problem will be denoted by n. For our sorting problem, n is obviously the number 
of items in the list. Although the size of a problem is not always so easily characterized, we can 
generally relate it in some way to a positive integer. 


3. In analyzing an algorithm, we are less interested in its performance on a specific case than in its 
general behavior. We are particularly concerned with how the algorithm behaves when the 
problem size increases. Because of this, the primary question involves how fast the resource 
requirements grow as n becomes large. 


Our immediate goal will then be to characterize the time requirement of a problem as a function of its 
size, using a Turing machine as the computer model. 


First, we give some meaning to the concept of time for a Turing machine. We think of a Turing 
machine as making one move per time unit, so the time taken by a computation is the number of moves 
made. As stated, we want to study how the computational requirements grow with the size of the 
problem. Normally, in the set of all problems of a given size, there is some variation. Here we are 
interested only in the worst case that has the highest resource requirements. By saying that a 
computation has a time-complexity T(n), we mean that the computation for any problem of size n can 
be completed in no more than T(n) moves on some Turing machine. 


After settling on a specific type of Turing machine as a computational model, we could analyze 
algorithms by writing explicit programs and counting the number of steps involved in solving the 
problem. But, for a variety of reasons, this is not overly profitable. First, the number of operations 
performed may vary with the small details of the program and so may depend strongly on the 
programmer. Second, from a practical standpoint, we are interested in how the algorithm performs in 
the real world, which may differ considerably from how it does on a Turing machine. The best we 
can hope for is that the Turing machine analysis is representative of the major aspects of the real-life 
performance, for example, the asymptotic growth rate of the time complexity. Our first attempt at 
understanding the resource requirements of an algorithm is therefore invariably an order-of-magnitude 
analysis in which we use the O,@, and 2 notation introduced in Chapter 1. In spite of the apparent 
informality of this approach, we often get very useful information. 


Example 14.1 


Given a set of n numbers x4, X>,..., x, and a key number x, determine if the set contains x. 


Unless the set is organized in some way, the simplest algorithm is just a linear search in which 
we compare x successively against x), x>,..., until either we find a match or we get to the last element 


of the set. Since we may find a match on the first comparison or on the last, we cannot predict how 
much work is involved, but we know that, in the worst case, we have to make n comparisons. We can 
then say that the time-complexity of this linear search is O(n), or even better, O(n). In making this 
analysis, we made no specific assumptions about what machine this is run on or how the algorithm is 
implemented. But the implication is that if we were to double the size of the set of numbers, the 
computation time would roughly be doubled. This tells us a great deal about searching. 


EXERCISES 


1. Suppose you are given a set ofn numbers x, X5,..., x, and are asked to determine whether this set 
contains any duplicates. 


(a) Suggest an algorithm and find an order-of-magnitude expression for its time-complexity. 


(b) Examine if the implementation of the algorithm on a Turing machine affects your 
conclusions. 


2. Repeat Exercise 1, this time determining if the set contains any triplicates. Is the algorithm as 
efficient as possible? 


3. Review how the choice of algorithm affects the efficiency of sorting. What is the time complexity 
of the most efficient sorting algorithms? 


14.2 Turing Machine Models and Complexity 


In the study of computability it makes little difference what particular model of Turing machine we 
use, but we have already seen that the efficiency of a computation can be affected by the number of 
tapes of the machine and by whether it is deterministic or nondeterministic. As Example 12.8 shows, 
nondeterministic solutions are often much more efficient than deterministic alternatives. The next 
example illustrates this even more clearly. 


Example 14.2 


We now introduce the satisfiability problem (SAT), which plays an important role in complexity 
theory. 

A logic or boolean constant or variable is one that can take on exactly two values, true or false, 
which we will denote by 1 and 0, respectively. Boolean operators are used to combine boolean 
constants and variables into boolean expressions. The simplest boolean operators are or, denoted by 
V and defined by 


0v1 =1v0=1v1 =1, 
0v0 = 0, 
and the and operator (A), defined by 
0^0 = OA1 = 1A0 = 0, 


IAL=1. 


Also needed is negation, denoted by a bar, and defined by 


0'= 1, 
T=0. 


We consider now boolean expressions in conjunctive normal form (CNF). In this form, we create 
expressions from variables x1,X2,..., Xp Starting with 


J 


T (14.1) 
The terms t; t;,..., tą are created by or-ing together variables and their negation, that is, 
(14.2) 


where each s, s 


m= Sp Stands for a variable or the negation of a variable. The s; will be called 


literals, while the t; are said to be clauses ofa CNF expression e. 

The satisfiability problem is then simply stated: Given a satisfiable expression e in conjunctive 
normal form, find an assignment of values to the variables x4, x>,..., X„ that will make the value of e 
true. For a specific case, look at 


e1 = (T1 VT2)A(£1V 23) . 


The assignment x, = 0, x, = 1, x; = 1 makes e, true so that this expression is satisfiable. On the other 
hand, 


e3 = (41 V rq) AX AZ2 (14.3) 


is not satisfiable because every assignment for the variables x, and x, will make e, false. 


A deterministic algorithm for the satisfiability problem is easy to discover. We take all possible 
values for the variables x,, x>,...,x,, and for each evaluate the expression. Since there are 2” such 
possibilities, this exhaustive approach has exponential time-complexity. 

Again, the nondeterministic alternative simplifies matters. Ife is satisfiable, we guess the value of 
each x; and then evaluate e. This is essentially an O (n) algorithm. As in Example 12.8, we have a 


deterministic exhaustive search algorithm whose complexity is exponential and a linear-time 
nondeterministic one. However, unlike Example 12.8, we do not know of any nonexponential 
deterministic algorithm. 


This example and Examples 12.7 and 12.8 suggest that complexity questions are affected by the 
type of Turing machine we use and that the issue of determinism versus nondeterminism is a 
particularly crucial one. Some general conclusions consistent with these observations can be made. 


Theorem 14.1 


Suppose that a two-tape machine can carry out a computation in n steps. Then this computation can be 
simulated by a standard Turing machine in O(n’) steps. 

Proof: For the simulation of the computation on the two-tape machine, the standard machine keeps the 
instantaneous description of the two-tape machine on its tape, as shown in Figure 14.1. To simulate 
one move, the standard machine needs to search the entire active area of its tape. But since one move 
of the two-tape machine can extend the active area by at most two cells, after n moves the active area 


has a length of at most O(n). Therefore the entire simulation can be done in O(n?) moves. 


Figure 14.1 
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This result is easily extended to more than two tapes (Exercise 6 at the end of this section). m 


Theorem 14.2 


Suppose that a nondeterministic Turing machine M can carry out a computation inn steps. Then a 
standard Turing machine can carry out the same computation in O (k°”) steps, where k anda are 
independent of n. 


Proof: A standard Turing machine can simulate a nondeterministic machine by keeping track of all 
possible configurations, continually searching and updating the entire active area. If k is the maximum 
branching factor for the nondeterminism, then aftern steps there are at mostk” possible 
configurations. Since at most one symbol can be added to each configuration by a single move, the 
length of one configuration after moves is O(n). Therefore, to simulate one move, the standard 
machine must search an active area of length O(nk”), leading to the desired result. For some details, 
see Exercise 7 at the end of this section. m 


We must interpret this theorem carefully. It says that a nondeterministic computation can always 
be performed on a deterministic machine if we are willing to take into account an exponential 
increase in the time required. But this conclusion comes from a particularly simple-minded simulation 
and one can still hope to do better. The exploration of this issue is the core of complexity theory. 


Example 12.7 suggests that algorithms for a multitape machine may be closer to what we might 
use in practice than the cumbersome method for a standard Turing machine. For this reason, we will 
use a multitape Turing machine as our model for studying complexity issues, but as we will see, this 
is a minor issue. 


EXERCISES 


For the exercises in this set, assume that the Turing machines involved are all deterministic. 


1. Find a linear-time algorithm for membership in {ww :w e {a, b}*} using a two-tape Turing 
machine. What is the best you could expect on a one-tape machine? 


2. Show that any computation that can be performed on a single-tape, off-line Turing machine in time 
O (T(n)) also can be performed on a standard Turing machine in time O (7(n)). 


3. Show that any computation that can be performed on a standard Turing machine in time O (7(n)) 
also can be performed on a Turing machine with one semi-infinite tape in time O (7(n)). 


4. Rewrite the boolean expression 


in conjunctive normal form. 


5. Determine whether or not the expression 
(ay VT VT3)A(T1Vrt2VT3)^A(T1VT2VT3) 
is satisfiable. 


6. Generalize Theorem 14.1 for k tapes, showing that n moves on a k-tape machine can be simulated 
on a standard machine in O(n) moves. 


7. In the proof of Theorem 14.2 we ignored one fine point. When a configuration grows, the rest of 
the tape's contents have to be moved. Does this oversight affect the conclusion? 


14.3 Language Families and Complexity Classes 


In the Chomsky hierarchy for language classification, we associate language families with classes of 
automata, where each class of automata is defined by the nature of its temporary storage. Another 
possibility for classifying languages is to use a Turing machine and consider time-complexity a 
distinguishing factor. To do so, we first define the time-complexity of a language. 


Definition 14.1 


We say that a Turing machine M decides a language L in time T(n) if every w in L with |w| =n is 
decided in T(n) moves. If M is nondeterministic, this implies that for every w € L, there is at least one 
sequence of moves of length less than or equal to T (|w)) that leads to acceptance, and that the Turing 
machine halts on all inputs in time 7 '({w)). 


Definition 14.2 


A language L is said to be a member of the class DTIME (T(n)) if there exists a deterministic 
multitape Turing machine that decides L in time O(7(n)). 


A language L is said to be a member of the class NTIME (T(n)) if there exists a nondeterministic 
multitape Turing machine that decides L in time O(7(n)). 


Some relations between these complexity classes, such as 
DTIME (T(n)) E NTIME (T (n)), 
and 
T1(n) = O(T,(n)) 
implies 
DTIME (T,(n)) E DTIME (T-(n)), 


are obvious, but from here the situation becomes obscure quickly. What we can say is that as the 
order of 7(n) increases, we take in progressively more languages. 


Theorem 14.3 


For every integer k > 1, 
DTIME (n*) € DTIME (n**'). 


Proof: This follows froma result in Hopcroft and Ullman (1979, p. 299).m 
Ty 


The conclusion we can draw from this 1s that there are some languages that can be decided in time 
O(n’) for which there is no linear-time membership algorithm, that there are languages in DTIME (n°) 


that are not in DTIME (n°), and so on. This gives us an infinite number of nested complexity classes. 
We get even more if we allow exponential time complexity. In fact, there is no limit to this; no matter 
how rapidly the complexity function T (n) grows, there is always something outside DTIME (T(n)). 


Theorem 14.4 


There is no total Turing computable function f (n) such that every recursive language can be decided 
in time f(n), where n is the length of the input string. 


Proof: Consider the alphabet £ = {0,1}, with all strings in £* arranged in proper order wj, Wy,.... 
Also, assume that we have a proper ordering for the Turing machines in M/,,M,,.... 


Assume now that the function f (n) in the statement of the theorem exists. We can then define the 
language 


L = {w; : M; does not decide w; in f (|w;|) steps}. (14.4) 


We claim that L is recursive. To see this, consider any w € L and compute first Awl). By assuming 
that fis a total Turing computable function, this is possible. We next find the position i ofw in the 
Sequence W),W>,.... This is also possible because the sequence is in proper order. When we have i, 


we find M; and let it operate on w for f(\w|) steps. This will tell us whether or not w is in L, so is 
recursive. 


But we can now show that L is not decidable in time An). Suppose it were. Since L is recursive, 
there is some M}, such that L = L (M,). Is w, in L? If we claim that if w, is in L, then M, decides w, 


in f(|w;,|) steps. But this contradicts (14.4). Conversely, we get a contradiction if we assume that w, ¢ 


L. The inability to resolve this issue is a typical diagonalization result and leads us to conclude that 
the original assumption, namely the existence of a computable f (n), must be false. m 


Theorem 14.3 allows us to make some claims, for example, that there is a language in DTIME 
(n^) that is not in DTIME (n>). Although this may be of theoretical interest, it is not clear that such a 
result has any practical significance. At this point, we have no clue what the characteristics of a 
language in DTIME (nf) might be. We can get a little more insight into the matter if we relate the 
complexity classification to the languages in the Chomsky hierarchy. We will look at some simple 
examples that give some of the more obvious results. 


Example 14.3 


Every regular language can be recognized by a deterministic finite automaton in time proportional to 
the length of the input. Therefore, 


Lagg E DTIME (n). 


But DTIME (n) includes much more than Lprg. We have already established in Example 13.7 that the 


context-free language {a”b” :n > 0} can be recognized by a two-tape machine in time O (n). The 
argument given there can be used for even more complicated languages. 


Example 14.4 


The non-context-free language L = {ww : we {a,b}*! is in NTIME (n). This is straightforward, as 
we can recognize strings in this language by the following algorithm: 


1. Copy the input from the input file to tape 1. Nondeterministically guess the middle of this string. 
2. Copy the second part to tape 2. 


3. Compare the symbols on tape | and tape 2 one by one. 
Clearly, all of the steps can be done in O (|w)) time, so L e NTIME(n). 


Actually, we can show that L € DTIME(n) if we can devise an algorithm for finding the middle of 
a string in O (n) time. This can be done: We look at each symbol on tape 1, keeping a count on tape 2, 
but counting only every second symbol. We leave the details as an exercise. 


Example 14.5 


It follows from Example 12.8 that 

Lop S DTIME (n°) 
and 

Lor E NTIME (n). 


Consider now the family of context-sensitive languages. Exhaustive search parsing is possible here 
also since only a limited number of productions are applicable at each step. Following the analysis 
leading to Equation (5.2), we see that the maximum number of sentential forms is 


N =|P|+|P/? +...,P|™ = O(|P|"t).. 
Note, however, that we cannot claim from this that 
Los C DTIME(|P|*""*), 


because we cannot put an upper bound on |P| and c. 


From these examples we note a trend: As T (n) increases, more and more of the families Lec, 
Lcr, Les are covered. But the connection between the Chomsky hierarchy and the complexity classes 
is tenuous and not very clear. 


EXERCISES 


1. Complete the argument in Example 14.4. 

2. Show that L = {ww®w : we fa, b}*} is in DTIME (n). 

3. Show that L = {www : we {a, b}*} is in DTIME (n). 

4. Show that there are languages that are not in NTIME (2”). 


14.4 The Complexity Classes P and NP 


At this point, it is instructive to summarize the difficulties we have encountered in trying to find useful 
complexity classes for formal languages and draw a few conclusions. 


1. There exists an infinite number of properly nested complexity classes DTIME (n"),k = 1, 2,.... 
These complexity classes have little connection to the familiar Chomsky hierarchy and it seems 
difficult to get any insight into the nature of these classes. Perhaps this is not a good way of 
classifying languages. 


2. The particular model of Turing machine, even if we restrict ourselves to deterministic machines, 
affects the complexity. It is not clear what kind of Turing machine is the best model of an actual 
computer, so an analysis should not depend on any particular type of Turing machine. 


3. We have found several languages that can be decided efficiently by a nondeterministic Turing 
machine. For some, there are also reasonable deterministic algorithms, but for others we know 
only inefficient, brute-force methods. What is the implication of these examples? 

Since the attempt to produce meaningful language hierarchies via time-complexities with different 
growth rates appears to be unproductive, let us ignore some factors that are less important, for 
example by removing some uninteresting distinctions, such as that between DTIME(n“) and 
DTIME(n*+1). We can argue that the difference between, say, DTIME(n) and DTIME(n’) is not 
fundamental, since some of it depends on the specific model of Turing machine we have (e.g., how 
many tapes). This leads us to consider the famous complexity class 


P =|) DTIME (r°). 
i>] 


This class includes all languages that are accepted by some deterministic Turing machine in 
polynomial time, without any regard to the degree of the polynomial. As we have already seen, LRgg 


and Lc, are in P. 


Since the distinction between deterministic and nondeterministic complexity classes appears to be 
fundamental, we also introduce 


NP = | | NTIME (n’). 
i> 


Obviously 
P CNP, 


but what is not known is if this containment is proper. While it is generally believed that there are 
some languages in NP that are not in P, no one has yet found an example of this. 


The interest in these complexity classes, particularly in the class P, comes from an attempt to 
distinguish between realistic and unrealistic computations. Certain computations, although 
theoretically possible, have such high resource requirements that in practice they must be rejected as 
unrealistic on existing computers, as well as on supercomputers yet to be designed. Such problems 
are sometimes called intractable to indicate that, while in principle computable, there is no realistic 
hope of a practical algorithm. To understand this better, computer scientists have attempted to put the 


idea of intractability on a formal basis. One attempt to define the term intractable 1s made in what is 
generally called the Cook-Karp thesis. In the Cook-Karp thesis, a problem that is in P is called 
tractable, and one that is not is said to be intractable. 

Is the Cook-Karp thesis a good way of separating problems we can solve realistically from those 
we cannot? The answer is not clear-cut. Obviously, any computation that is not in P has time- 
complexity that grows faster than any polynomial, and its requirements will increase very quickly 
with the problem size. Even for a function like 29.1" this will be excessive for large n, say n > 1000, 
so we might feel justified in calling a problem with this complexity intractable. But what about 
problems that are in DTIME (n'”)? While the Cook-Karp thesis calls such a problem tractable, one 
surely cannot do much with it, even for small n. The justification for the Cook-Karp thesis seems to 
lie in the empirical observation that most practical problems in P are in DTIME (n), DTIME (n?), or 
DTIME (n°), while those outside this class tend to have exponential complexities. Among practical 
problems, a clear distinction exists between problems in P and those not in P. 


14.5 Some NP Problems 


Computer scientists have studied many NP problems, that is, problems that can be solved 
nondeterministically in polynomial time. Some of the arguments involved in this are very technical, 
with a number of details that have to be resolved. 

Traditionally, complexity questions are studied as languages, in such a way that the cases that 
satisfy the stated conditions are described by strings in some language L, while those that do not are 


in L. So, often the first thing that needs to be done is to rephrase our intuitive understanding of the 
problem in terms of a language. 


Example 14.6 


Reconsider the SAT problem. We made some rudimentary argument to claim that this problem can be 
solved efficiently by a nondeterministic Turing machine and, rather inefficiently, by a brute-force 
exponential search. A number of minor points were ignored in that argument. 


Suppose that a CNF expression has length n, with m different literals. Since clearly m <n, we can 
taken as the problem size. Next, we must encode the CNF expression as a string for a Turing 
machine. We can do this, for example, by taking È = {x, V, A, (,), —, 0, 1} and encoding the subscript 
of x as a binary number. In this system, the CNF expression ("1 V T2) ^ (£3 V 74) is encoded as 


(x1 V2 —10) A (z11 v z100). 


Since the subscript cannot be larger than m, the maximum length of any subscript is /ogym. As a 
consequence the maximum encoded length of an n-symbol CNF is O(nlogn). 


The next step is to generate a trial solution for the variables. Nondeterministically, this can be 
done in O(n) time. (See Exercise 1 at the end of this section.) This trial solution is then substituted 
into the input string. This can be done in O(n/ogn) time*. The entire process therefore can be done in 
O(n7logn))or O(n) time, and SAT € NP. 


There are a large number of graph problems that have been studied and are known to be in NP. 
Example 14.7 


The Hamiltonian Path Problem Given an undirected graph, with vertices vj, U5,..., Up 
Hamiltonian path is a simple path that passes through all the vertices. The graph in Figure 14.2 has a 
Hamiltonian path (v5, v1), (Vj, 03), (V3, Vs), (Vs, V4), (V4, Ve). The Hamiltonian path problem 
(HAMPATH) is to decide if a given graph has a Hamiltonian path. 

A deterministic algorithm is easily found, since any Hamiltonian path is a permutation of the 
vertices V1, V>,..., Vp - There are n! such permuations, and a brute-force search of all of them will 
give the answer. Unfortunately, this comes at a great expense, even for modest n. 


a 


To explore the nondeterministic solution, we must first find a way to represent a graph by a string. 
One of the simplest and most convenient ways of encoding graphs is by an adjacency matrix. For a 
directed graph with vertices v4, V>,..., Vp and edge set E, an adjacency matrix is ann x n 


Figure 14.2 
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array in which a (i, j), the entry in the ith row and jth column satisfies 
a(i, j) = 1 if (uj, v;) E€ E, 
= 0 if (v;i, vj) ¢ E. 
An undirected edge can be considered two separate edges in opposite directions. The array 
represents the graph in Figure 14.2. 
010 00 
LUCO ITI 
110010 
vovg Ii 
SAE Da 
DVT T0 


A graph with n vertices then requires a string of length n? for its representation. For an undirected 


graph the matrix is symmetric, so the storage requirement can be reduced to (n + 1)n /2, but in any 
case, the input string will have length O(n’). 


Next, we generate, nondeterministically, a permutation of the vertices. This can be done in O(n*) 
time. Finally, we check the permutation to see if it constitutes a path. A time O(n’) is sufficient for 
this. Therefore, HAMPATH e NP. 


Example 14.8 


The Clique Problem Let G be an undirected graph with vertices 01,05,...,0,. A k-clique is a subset 


V, S 2", such that there is an edge between every pair of vertices v;, vj E€ Vy. The clique problem 
(CLIQ) is to decide if, for a given k, G has a k-clique. 


A deterministic search can examine all the elements of 2”. This is straightforward, but has 
exponential time-complexity. A nondeterministic algorithm just guesses the correct subset. The 
representation of the graph and the checking are similar to the previous example, so we claim without 
further elaboration that the clique problem can be solved in O(n*) time and that 


CLIO € NP. 


There are many other such problems: some similar to our examples, others quite different, but all 
sharing the same characteristics. 


1. All problems are in NP and have simple nondeterministic solutions. 


2. All problems have deterministic solutions with exponential time-complexity, but it is not known if 
they are tractable. 


To get further insight into the connection between these various problems, we need to find some 
commonality for all these seemingly different cases. 


EXERCISES 


1. In Example 14.6, show how a trial solution can be generated in O(n) time. This means that all 2” 
possibilities must be generated in a decision tree with height O(n). 


. Show how in Example 14.6 the checking of the trial solution can be done in O(n7/ogn) time. 

. Discuss how in HAMPATH a permutation can be generated nondeterministically in O(n?) time. 
In HAMPATH, how can the checking for a Hamiltonian path be done in O(n*) time? 

. Show that a k-clique must have exactly k(k — 1)/2 edges. 


Nn A V N 


. Find a 4-clique in the graph below. Prove that the graph has no 5-clique. 


- 


7. Give the details of the argument in Example 14.8. 


8. Show that P is closed under union, intersection, and complementation. 


14.6 Polynomial-Time Reduction 


One way to unify different cases is to see if we can reduce them to each other, in the sense that if one 
is tractable, the others will be tractable also. 


Definition 14.3 


A language L, is said to be polynomial-time reducible to another language L, if there exists a 
deterministic Turing machine by which any w, in the alphabet of L} can be transformed in polynomial 
time to a w, in the alphabet of L, in such a way that w; € L} if and only if wy € L}. 


Example 14.9 


In the satisfiability problem we put no restriction on the length of a clause. A restricted type of 
satisfiability is the three-satisfiability problem (3SAT) in which each clause can have at most three 
literals. The SAT problem is polynomial-time reducible to 3SAT. 


We illustrate the reduction with the simple 4-literal expression 
Gy = (£r V Toa NV Tta: V La). 
We introduce a new variable z and construct 


lo = (Er Vto N GA AANV 24 NVZ}. 


~ 


If e; is true, one of the x1, x2, x3, x4 must be true. If x, V x, is true, we choose z = 0, and e; is true. If x3 
V x4 = 1, we can choose z = 1 to satisfy e2. Conversely, if e, is true, e} must also be true, so for 
satisfiability, e, and e, are equivalent. 


We claim that this pattern can be extended to clauses with more than four literals, but we will 


leave the argument as an exercise. We include an exercise to show that the conversion of a CNF 
expression from the SAT form to 3SAT form can be done deterministically in polynomial time. 


Example 14.10 


The 3SAT problem is polynomial-time reducible to CLIQUE. 

We can assume that in any 3SAT expression each clause has exactly three literals. If in some 
expression this is not the case, we just add extra literals that do not change the satisfiability. For 
example,(x, V x) is equivalent to (x, V x; V x2). Consider now the expression 


e = (£1 V T2 V T3) A (Ti V Tə V z3) A (T1 V T3 V z3) A (£1 V T2 V £3). 
Figure 14.3 
group 1 
x, x, X 
kinia e S ee E ia 
| > = 
X, d ae m X; 
À r as a . | 
group2) (x, }t | X, | |group3 
S —.., a as 
j Å . —— f 


group 4 
We draw a graph in which each clause is represented by a group of three vertices and each literal is 
associated with one of the vertices (Figure 14.3). 
For each vertex in a group, we put in an edge to all vertices of the other groups, unless the two 
associated literals are complements. So in Figure 14.3 we draw an edge’ from (72)1 to (x3)>, and from 
\72)1 to (x3)3, but not from(72)1 to (x5)>. Figure 14.3 shows a subset of the edges (the full set looks 


very messy). Notice that the subgraph with vertices \"2)1, (x3)>, (x3)3, and (x,)4 is a 4-clique and that 


fg 2 = yp — | 


is a variable assignment that satisfies e. 


This approach can be generalized for any 3SAT expression with k clauses. It can be shown that 
the 3SAT problem can be satisfied if and only if the associated graph has a k-clique. Furthermore, it 
is not hard to see that the transformation from the 3SAT expression to the graph can be done 
deterministically in polynomial time. 


The point of these reductions is that we can now look at a given problem in several ways. 
Suppose we conjecture that SAT is tractable. If this is difficult to prove, we might try the simpler 
3SAT case. If this does not work either, we can try to find an efficient algorithm for the clique 


problem, or some other related graph problem. If any of the options can be shown to be tractable, we 
can claim that SAT is tractable. 


EXERCISES 


1. Show how a CNF expression with clauses of five literals can be reduced to the 3SAT form. 
Generalize your method for clauses with an arbitrary number of literals. 


2. Show how the reduction of SAT to 3SAT can be done in polynomial-time. 


3. Justify the statement in Example 14.10 that a 3SAT expression with k clauses is satisfiable if and 
only if the associated graph has a k-clique. 


4. Show that the construction of the graph associated with a 3SATexpression can be done 
deterministically in polynomial-time. 


* 5. The Traveling Salesman Problem (TSP) can be stated as follows. Let G be a complete graph, 
whose edges are assigned nonnegative weights. The weight of a simple path is the sum of all the 
edge weights. The TSP problem is to decide if, for any given k > 0, the graph G has a Hamiltonian 
path with a weight less than or equal to k. 


Show that HAMPATH is polynomial-time reducible to TSP. 


14.7 NP-Completeness and an Open Question 
There are a number of problems that are central to complexity study and are such that, if we 
completely understood one of them, we would understand the major issue involved in tractability. 


Definition 14.4 


A language L is said to be NP-complete ifL e¢ NP and every L; € NP is polynomial-time 


reducible to L. 


It follows from this definition that if L is NP-complete and polynomial-time reducible to Z,, then 
L, is also NP-complete. So if we can find one deterministic polynomial-time algorithm for any NP- 
complete language, then every language in NP is also in P, that is, 


P= NP. 


Here we hold out the hope that efficient algorithms exist for such problems, even if none have been 
found yet. On the other hand, if one could prove that any of the many known NP-complete problems is 
intractable, then many interesting problems are not practically solvable. This puts NP-completeness 
in the center of the question of the tractability of many important problems. At this point then, we need 
to study some NP-complete problems. The next result, known as Cook's theorem, provides the entry 
point to this study. 


Theorem 14.5 


The Satisfiability Problem(SAT) is NP-complete. 


Proof: The idea behind the proof is that for every configuration sequence of a Turing machine one 
can construct a CNF expression that is satisfiable if and only if there is a sequence of configurations 
leading to acceptance. The details, unfortunately, are long and complicated. They are technical and 
shed little light on the NP problem, so we will omit them here. Extensive discussions of Cook's 
theorem can be found in books devoted to complexity. m 


If we accept Cook's theorem, we immediately have a number of NP-complete problems. 


Example 14.11 


We have shown that SAT can be reduced to 3SAT, and that 3SAT can be reduced to CLIQ. Therefore 
3SAT and CLIQ are both NP-complete. 


It turns out that HAMPATH is also NP-complete, but the reductions needed to show this are less 
obvious. 


In addition to SAT, 3SAT, CLIQ, and HAMPATH, there are many more problems that are known 
to be NP-complete. A good deal of effort has been expended in trying to find efficient algorithms for 
any of these, so far without success. This leads us to conjecture that 


PNP 


and that many important problems are intractable. But while this is a reasonable working conjecture, 


it has not been proved. It remains the fundamental open problem in complexity theory. 


EXERCISES 


1. Show that TSP is NP-complete. 


* 2. Let G be an undirected graph. An Euler circuit of the graph is a simple cycle that includes all 
edges. The Euler Circuit Problem (EULER) is to decide if G has an Euler circuit. 


Show that EULER is not NP-complete. 


3. Consult books on complexity theory to compile a list of NP-complete problems. 
4. Is it possible that P = NP is undecidable? 


* This estimate depends on how we visualize this process and can be improved. But this is of no concern, as we do not distinguish 
between the degrees of polynomials 


Î The second subscripts identify the group to which the vertices belong. 


Appendix A 


Finite-State 
Transducers 


inite accepters play a central role in the study of formal languages, but in other areas, such 
as digital design, transducers are more important. While we cannot go into this subject 

F matter in any depth, we can at least outline the main ideas. A full treatment, with many 
examples of practical applications, can be found in Kohavi and Jha, 2010. 


A.l A General Framework 


Finite-state transducers (fst's) have many things in common with finite accepters. An fst has a finite 
set Q of internal states and operates in a discrete time frame with transitions from one state to another 
made in the interval between two instances ¢, and ¢,,,,. An fst is associated with a read-once-only 


input file that contains a string from an input alphabet >’, and an output mechanism that produces a 
string from an output alphabet T in response to a given input. It will be assumed that in each time 
step one input symbol is used, while a single output symbol is produced (we also say printed). 


Since an fst just translates certain strings into other strings, we can look at the fst as an 
implementation of a function. If M is an fst, we will let Fy, denote the function represented by M, so 


Pye DSR, 


where D is a subset of >°* and R is a subset of T*. For most of the discussion we assume that D = >°*. 


Interpreting an fst as a function implies that it is deterministic, that is, the output is uniquely 
determined by the input. Nondeterminism is an important issue in language theory, but plays no 
significant role in the study of finite-state transducers. 


The rule that one input symbol results in one output symbol appears to imply that the mapping Fy 
is length-preserving, that is, that 


Fyw) = wl. 


But this is more apparent than real. For example, we can always include the empty string à in T, so 
that 


Fu) < || 


becomes possible. There are other ways in which the length-preserving restriction can be overcome. 


There are several types of fst's that have been extensively studied. The main difference between 
them is on how the output is produced. 


A2 Mealy Machines 


In a Mealy machine, the output produced by each transition depends on the internal state prior to the 
transition and the input symbol used in the transition, so we can think of the output produced during 


the transition. 


Definition A.1 


A Mealy machine is defined by the sextuple 
M=(Q, >, TL, 0, 8 qq), 


where 

Q is a finite set of internal states, 

$ is the input alphabet, 

I is the output alphabet, 

6: Q x E — Q is the transition function, 

0: O x } — T is the output function, 

qo € Q is the initial state of M. 

The machine starts in state gj at which time all input is available for processing. If at time ¢, the 
Mealy machine is in state q;, the current input symbol is a, and ò(q; a) = q;, 9(g; a) = 5, the machine 
will enter state q; and produce output b. It is assumed the entire process is terminated when the end of 
the input is reached. Note that there are no final states associated with a transducer. 


Transition graphs are as useful here as they are for finite accepters. In fact, the only difference is 
that now the transition edges are labeled with a/b, where a is the current input symbol and b is the 


output produced by the transition. 


Example A.1 


The fst with Q = {qo qi}, > = {0,1}, T = {a, b, c}, initial state gp, and 


6(go,9) = qı, (40,1) = qo; 
6(q1,9) = qo, O(qi,1) = 41. 
M Oe, D mE 
0(qı,0) = b, O(q, 1) =a 


is represented by the graph in Figure A.1. This Mealy machine prints out caab when given the input 
string 1010. 


Figure A.1 


0/é 


Example A.2 


Construct a Mealy machine M that takes as input strings of 0’s and 1’s. Its output is to be a string of 
0’s until the first 1 occurs in the input, at which time it will switch to printing 1’s. This is to continue 
until the next 1 is encountered in the input, when the output reverts to 0. The alternation continues 
every time a 1 is encountered. For example, Fy, (0010010) = 0011100. This fst is a simple model for 


a flip-flop circuit. Figure A.2 shows a solution. 
Figure A.2 


0/0 0/1 


A.3 Moore Machines 

Moore machines differ from Mealy machines in the way output is produced. In a Moore machine 
every state is associated with an element of the output alphabet. Whenever the state is entered, this 
output symbol is printed. The output is produced only when a transition occurs; thus, the symbol 
associated with the initial state is not printed at the start, but may be produced if this state is entered 
at a later stage. 


Definition A.2 


A Moore machine is defined by the sextuple 


M=(0,%, T, 6, 0, qo), 


where 

Q is a finite set of internal states, 

>; is the input alphabet, 

I is the output alphabet, 

ò: Q x $ — Q is the transition function, 
0 : Q — T is the output function, 


qo € Q is the initial state. 

The machine starts in state go, at which time all input is available for processing. If at time ¢,, the 
Moore machine is in state q;, the current input symbol is a, and ò(q;, a) = q;, 9(q;) = b, the machine 
will enter state q; and produce output b. 


In the transition graph of a Moore machine, each vertex now has two labels: the state name and 
the output symbols associated with the state. 


Example A.3 


A Moore machine solution for the problem in Example A.2 is given in Figure A.3. 
Figure A.3 


0 0 
i 


Example A.4 


In Example 1.17 we constructed a transducer for adding two positive binary numbers. Figure 1.9 
shows that what we have constructed is actually a two-state Mealy machine. A Moore machine for 


this problem is also easily constructed, but now we need four states to keep track not only of the 
carry, but also of the output symbol. A solution is shown in Figure A.4. 


Figure A.4 


00 01 


00 


A.4 Moore and Mealy Machine Equivalence 


The examples in the previous section show the difference between Moore and Mealy machines, but 
they also suggest that if a problem can be solved by a machine of one type, it can also be solved by 
one of the other type. In this sense, the two types of transducers are possibly equivalent. 


Definition A.3 


Two finite-state transducers M and N are said to be equivalent if they implement the same function, 
that is, if they have the same domain and if 


Fy (w) = Fy w), 


for all w in their common domain. 


Definition A.4 


Let C; and C, be two classes of finite-state transducers. We say that C, and C, are equivalent if for 
every fst M in one class there exists an equivalent fst N in the other class, and vice versa. 


We will now show that the Mealy and Moore machine classes are equivalent. For this we need to 
introduce the extended transition function 6* and the extended output function 0*. The expression 


ô* (qi, W) = q; 
is meant to indicate that the fst goes from state q; to state q; while processing the string w. Similarly, 
0* (qj, w) =v 


is to show that the fst produces output v when starting in state q; and given input w. For both Mealy 
and Moore machines 6* is formally defined by 


ô* (qia) = 4(q;, a). 


ô” qi, wa) = òf 5 gi, w), a), 
for all a € X and all w € >. For Mealy machines 


0* (qia) = 8lqi a), 


A* | qi wa) = A ( qi, w)O( ô” (qi, w) al, 
while for Moore machines 


0" (qi.a) = O O( gi, al), 


0* (qi; wa) = 6*(q;.w)0(d*(q;, wa) ). 
The conversion of a Moore machine to an equivalent Mealy machine is straightforward. The 


states of the two machines are the same, and the symbol that is to be printed by the Moore machine is 
assigned to the Mealy machine transition that leads to that state. 


Example A.5 


The Moore machine in Figure A.5(a) is equivalent to the Mealy machine in Figure A.5(b). 
Figure A.5 


Theorem A.1 


For every Moore machine there exists an equivalent Mealy machine. 
Proof: Let M = (Q, >, T, oy, Oy qo) be a Moore machine. We then construct a Mealy machine N = 
(Q, dL; On, On; qo), where 


On = ÖM 
and 
On (Gp @ = Oy Ôm (4; a)). 


It is intuitively clear that M and N are equivalent. Both machines go through the same states in 
response to a given input. M prints a symbol when a state is entered, N prints the same symbol during 
the transition to that state. A more explicit proof involves an easy induction that we leave to the 
reader. m 


The conversion from a Mealy machine to an equivalent Moore machine is a little more 
complicated because the states of the Moore machine now have to carry two pieces of information: 
the internal state of the corresponding Mealy machine, and the output symbol produced by the Mealy 
machine's transition to that state. In the construction we create, for each state q; of the Mealy machine, 


II] states of the Moore machine labeled q;„ a € T. The output function for the Moore machine states 
will be 0(q;,) =a. When the Mealy machine changes to state q; and prints a, the Moore machine will 
go into state q;, and so print a also. 


Example A.6 


The Mealy machine in Figure A.6(a) and the Moore machine in Figure A.6(b) are equivalent. 
Figure A.6 


Theorem A.2 


For every Mealy machine N there exists an equivalent Moore machine M. 

Proof: Let N = (Qy, >, L, Oy, Oj qo) With On = {o> q1s----> In} be a Mealy machine. We then 
construct a Moore machine M = (Oy, >, T, ôm Om dor) as follows: Create the states and the output 
function of M by 


dia © Qm 
and 
Om (Gia) = 4, 
for all i= 1, 2,..., and all a € I. For every transition rule 
On (q; D=4 a 
and corresponding output function 
Oy (qp 4) =b 


we introduce || rules 


Ou (dio 4) = Yo 


for all c € T. Since the start state symbol is not printed before the first transition, the initial state for 
M can be any state qo, r € I. This completes the construction. 


To show that N and M are equivalent, we first show that if 
ÔN (do. W) = qk, (A.1) 
then there is a c € I such that 
ÔM (dor, W) = ke- (A.2) 


This, and its converse, can be proved by an induction on the length of w. 
Next consider 


* * 
í á 


ON (do, wa) = ON (do, w)On (ÔN (Go, w), a)) (A.3) 
and suppose that ÒN (qo, w) = dk, Ôn (Gk. a) = qı and On (qk, a) = b. Then 
AN | ON (qgo,w),a) = b. 
From (A.2) and 
Ôm Ike 4) = qm 
it follows that 


Aig (dor, Wa) = Oi (dor, w)A m (qw) 


Ons (dor. W )b. 
Returning now to (A.3) 


AN, le 0,wa) = As. (c 0, wiGn (Ox; (¢ ow) a)) 
N\G N\G j N\G 


Ax (go, w)b. 
If we now make the inductive assumption 

On (go,w) = On (dor, w) (A.4) 
for all |w| < m and anyr € T then 


ON (do, wa) = Ai, (dor, w)b 


= Öm (Gor, wa). 


and so (A.4) holds for all m. 
Putting the two constructions together we get the fundamental equivalence result. m 


Theorem A.3 


The classes of Mealy machines and Moore machines are equivalent. 


A.5 Mealy Machine Minimization 


For a given function }** — I* there are many equivalent finite-state transducers, some of them 
differing in the number of internal states. For practical reasons it is often important to find the 
minimal fst, that is, the machine with the smallest number of internal states. 


The first step in minimizing a Mealy machine is to remove the states that play no role in any 
computation because they cannot be reached from the initial state. When there are no inaccessible 
states, the fst is said to be connected. But a Mealy machine can be connected and yet not be minimal, 
as the next example illustrates. 


Example A.7 


The Mealy machine in Fig A.7(a) is connected, but it is clear that the states g; and q serve the same 
purpose and can be combined to give the machine in Figure A.7(b). 


Figure A.7 


Definition A.5 


Let M=(Q, >}, I, 0, @ qo) be a Mealy machine. We say that the states q; and q, are equivalent if and 
only if 


O* (q; w) = 0* (qj, w) 


for all w € }*. States that are not equivalent are said to be distinguishable. 


Definition A.6 


Two states q; and q; are said to be k-equivalent 


0* (qi, w) = A" (qj, w) 


for all |w| < k. Two states are k-distinguishable if there exists a |w| < k such that ? "(w, gi) FO (w, aj), 


Theorem A.4 


(a) Two states q; and q; of a Mealy machine are 1-distinguishable if there exists ana € >) such that 
O(qi, a) £ O( qj, a). 


(b) The two states are k-distinguishable if they are (k — 1)-distinguishable or if there is ana € }, such 
that 


0(q;.@) = qr 


Ò(qj a) = qs, 


where q, and q, are (k — 1)-distinguishable. 


Proof: Part(a) follows directly from the definition and so does the conclusion that states are k- 
distinguishable if they are (A — 1)-distinguishable. For Part (b), we know that there exists a w| <k- 1, 
such that 


A" (dp, w) £ O* (qs, w). 
Now 
8" (qi, aw) = O(qi. a)” (qr. w) 
and 
0* (qj aw) = b (qj, a)0* (qs, w). 


The result then follows. m 


Theorem A.5 


Let M; = (Q, }, I, ô, 0, qo) be a Mealy machine in which all states are distinguishable. Then M} is 


minimal and unique (within a relabeling of states). 

Proof: Suppose that there exists an equivalent machine M, with fewer states than M}. If M2 has 
fewer states than M,, by the pigeonhole principle, at least one of the states of M2 must combine 
several functions of the states of M4. Specifically, there must be two strings, each of which leads to a 
different state in M4, but which both lead to the same state in M2. What possible structure can M2 
have? 

Suppose, for the sake of illustration, that M} has the partial structure shown in Figure A.8 and we 
try to combine states g, and q, for M,. To preserve the equivalence, the partial structures of M, must 
be as shown in Figure A.9. But since g, and q, are distinguishable in M,, there must be some w so 
that 


0* (qu w) Æ O* (go, w). 


Figure A.8 


Figure A.9 


Therefore in M} the input strings aw and bw must produce different output. But in M, they clearly 

print the same thing. This contradiction implies that, at this level, the two machines must be identical. 

Since this reasoning can be extended to any part of the two machines, the theorem follows. m 
es 


The minimization of a Mealy machine therefore starts with an identification of equivalent states. 
Partition Algorithm 


1. With states O = {qo, q1; qn} find all states that are 1-equivalent to gq and partition Q into two sets 
{do qi---q;} and {q,,...q)}. The first of these sets will contain all states that are 1-equivalent to 
do, the second will contain all states that are 1-distinguishable from it. Next, we repeat this 
process with states q1, 92,..., qn. Removing duplicate sets, we are left with a partitioning based 


on 1-equivalence. 


2. For every pair of states q; and q; in the same equivalence class determine if there are transitions, 
as in Theorem A.4, so that there are states q, and q, in different equivalence classes. If so, create 
new equivalence classes to separate them. Check all pairs to find their appropriate equivalence 
classes. 

3. Repeat step 2 until no new equivalence classes are created. 

At the end of this procedure, the state set Q will have been partitioned into equivalence classes £}, 

E3, ..., Eq so that all members of each class are equivalent in the sense of Definition A.5. 

To justify the procedure, several points have to be addressed. The first is that after the Ath pass 


though step 2 all elements of an existing equivalence class are (k + 1)-equivalent. This follows by 
induction, using part (b) of Theorem A.4. 


The second point is that the process must terminate. This is clear since each pass though step 2 
creates at least one new equivalence class, and there can be at most |Q| such classes. 


Finally, we must show that a complete equivalence partitioning has been achieved when the 
process stops. This can be seen from part (b) of Theorem A.4. For a pair (q, q;) to be k- 


distinguishable, there must exist (k — 1)-distinguishable states g, and g,; if no such states exist, no 


states that can be distinguished by longer strings can exist. Therefore, that equivalence partitioning 
must be complete. 


Minimum Mealy Machine Construction Let M = (Q, >, I) ô, 0, qo) be the Mealy machine for which 
we want to construct a minimal equivalent machine P = (Qp, >, I, Op, Op, Op), where Op = {E}, Ez, 
..., Emy. First we find the equivalence classes £}, £,..., E, using the partition procedure and create 
states labeled £4, E5,..., Em for P. Pick an element q; from £, and an element q; from £,. If ò (q; a) = 
q; and 0(q; a) = b, define the transition function for P by 


Op (Fe a) - E; 
and output by 
0p (E,„ a) =b. 


If the start state for M is qo, the start state for P will be £, so that gp is in the equivalence class Ep. It 


is a straightforward exercise to show that P is the minimal equivalent of M. It also follows from 
Theorem A.5 that a minimal Mealy machine is unique within a simple relabeling of the states. 


Example A.8 
Consider the Mealy machine in Figure A.10. Step 1 produces the equivalence partitioning {qo, q4}, 


{71> q2}, {q3}. In the second step we find that (qo, a) =q), Òlq4, a) =q3. Since qı and q3 are 
distinguishable, so are gy and q4 and the new partition is 


Ey = {40}, Hn = {9p Q23, E3 = {q3}, Ba = {q4}. 


Another pass though step 2 yields no further refinement and the partitioning if finished. From it, we 
construct the minimal Mealy machine in Figure A.11. 


Figure A.10 


Figure A.11 


A.6 Moore Machine Minimization 


The minimization of a Moore machine follows the pattern of the minimization of Mealy machines, but 
there are some differences. While Definition A.5 and A.6 apply to Moore machines, Theorem A.4 
needs to be modified. 


Theorem A.6 


(a) Two states q; and q; of a Moore machine are 0-distinguishable if 
Olgi) = Of qi). 


(b) the two states are k-distinguishable if they are (k — 1)-distinguishable or if there exists ana € >) 
such that 6(q;, a) = q, and 0(q;, a) = qs, where q, and q, are (k — 1)-distinguishable. 
Proof: The argument here is essentially the same as in Theorem A.4. m 


For the Moore machine minimization we first use the Mealy machine partition procedure to 
partition the states into equivalence classes and use them following the minimization process. 


Minimal Moore Machine Construction 

Let M = (Q, >, T, 6, 0, go) be the Moore machine to be minimized. First we establish the equivalence 
classes EF), E,...,£,, by the partition algorithm and create states labeled Æ}, F,...,£,, for the 
minimized machine P. Pick an element from £, and an element q; from Es. If ô(q; a) =q; and 0(q;) = 
b, then the transition function for P is 


bp (En a) =E, 
and 
0, (E) =b. 


This will yield the minimal Moore machine. 


There is a minor complication with the minimal Moore machine construction. The minimization 
process assigns 0(q,) to the equivalence state associated with qọ. In some cases this can lead to non- 


uniqueness. 
Example A.9 


Look at the two Moore machines in Figure A.12. Clearly they are equivalent and also clearly they are 
minimal. But since the output for the two initial states are different, they cannot be made identical by 
just a relabeling. The difficulty comes from the fact that the output for the initial state is arbitrary 
unless that state can be re-entered. But this is a trivial issue that can be ignored. 


Figure A.12 
1- a Ar 
(a) (b) 


A.7 Limitations of Finite-State Transducers 


Mealy and Moore machines are finite-state automata so we rightly suspect that their capabilities are 
limited, just as finite accepters are limited. To explore these limitations we need something like the 
pumping lemma for regular languages. 


Theorem A.7 


Let M=(Q, >, T, ô, 0, qo) be a Mealy machine. Then there exists a state q; € Q and aw € J+, such 
that 


ò* (q; w) =q; 


Proof: This follows from the pigeonhole principle, noting that |Q| is finite but w can be arbitrarily 
long. m 


Example A.10 


Consider the function F{a}* — {a, b}*, defined by 


F(a?” ) a a™b”. 
F(a"? | = abti 

Is there a Mealy machine that implements this function? A few tries quickly suggest that the answer is 
no. 


If there existed such a machine with m states, we could pick as inputw =a”. During the 
processing of the first part of this string, by Theorem A.7, the machine would have to go into a cycle 
in which an input of a produces an output of a. To escape from the cycle we would need an input 
other than a. But since there is no other input, the machine would continue to print a’s and so not 
represent the function. The contradiction shows that no such machine can exist. 


The conclusion from this example should not be surprising as it involves translating a regular set 
into one that is not regular. In fact, the structural similarity between fst's and finite accepters suggests 
a connection between regular languages and the output produced by finite-state transducers. 


Definition A.7 


Let M be a finite-state transducer implementing the function Fy, and let L be any language. Then 
Ty (L) = {Fy (wW): we L} 


is the M-translation of L. 


Now an fst can generate anyoutput since it can just reproduce input that is generally not limited. 
But if the input is a regular language, so is the output. 


Theorem A.8 


Let M = (Q, >, T, by, Om qo) be a Mealy machine and let L be a regular language. The 7™ (L) is also 
regular. 

Proof: If Z is regular, then there exists a dfa N= (P >, ôy Po, F) such that L = L(N). From M and N 
we now construct a finite accepter (possibly nondeterministic) H = (Qp, }, ôm Gy Fp) as follows: 


Ou = iag: Pe P, a € QS. 
Ifônp; a) = Pp Ou (q; a) = ql, and Oy (q; a) = b, then 
ÔH (qij: b) = dki- (A.5) 


The initial state for H will be gop and its final state set 


la~a heh =O). 
Then Ty (L) is regular. 
To defend this statement, notice first that if °N (Pi w) = pr and òg (qj, w) = q}, then 
Or (ijs Org ( dij: W)) = dki- 


This follows bya straightforward induction. Therefore, if ôy (po, w) € F, then 


ÒH (d00; Ons (goo. w)) E FR. 


and if w € L, then Fy, (w) € L (A). 


To complete the argument we must also show that if a string v is accepted by H, there must be a w 
€ L, such that v = Fy, (w). Suppose now that instead of H we construct another dfa H’, identical to H, 


except that (A.5) is replaced by 
On (qij, a) = qkr 


The N and H' are equivalent and a string w is accepted by H’ if and only if w € L. Now ifv € L(A), 
then in the transition graph for H there is a path from qoo to a final state labeled v. But the same path in 


H’ is labeled w, so v = Fy (w). Therefore w € L. m 


Appendix B 


JFLAP: 
A Recommendation 


he basic premise of this book is that understanding difficult abstract concepts is best 
achieved through illustrative examples and challenging exercises, so problem solving is a 
| central theme of our approach. 


Solving a difficult problem normally involves two distinct steps. First we must 

understand the issues, decide what theorems and results apply, and how to put it all together 

to arrive at a solution. This tends to be the most difficult part and often requires insight and 

inventiveness. But once we have a clear understanding of the solution process, a more routine step is 

still necessary to produce concrete results. In our study, this involves actually constructing automata 

or grammars and testing them for correctness. This step may be less challenging but tedious and error 

prone. It is here that mechanical help in the form of software can be very useful. In my experience, 
JFLAP serves this purpose admirably. 


JFLAP is an interactive tool built on the concepts in this book. It was created by Professor Susan 
Rodger and her students at Duke University. It has been used successfully in many universities overa 
numberof years. The CD that comes with this book gives you a brief introduction to JFLAP, how to 
get it and how to use it. The CD also contains many exercises that illustrate the power of JFLAP as 
well as a library of functions that are helpful in studying the material in the book. 


JFLAP is useful in many ways. Forthe student, JFLAP gives a way of seeing how abstract 
concepts are implemented in practice. Seeing how a difficult construction, such as the conversion of a 
dfa to a regular expression, is implemented brings to life something that may be difficult to grasp 
otherwise. Non-intuitive concepts, such as nondeterminism, are illustrated in a practical way. JFLAP 
is also a great time saver. Constructing, testing, and modifying automata and grammars can be done in 
a fraction of the time it takes with the more traditional pencil-and-paper method. Since extensive 
testing is easy, it will also improve the quality of the final product. 


Instructors can also benefit from JFLAP. Electronic submission and batch grading will save much 
effort, while at the same time increasing the accuracy and fairness of the evaluations. Exercises that 
are instructive, but often avoided because of the large amount of busywork involved, are now 
possible. An example here is the conversion from a right-linear grammar to an equivalent left-linear 
one. Working with Turing machines is notoriously onerous and error prone, but with JFLAP many 
more challenging assignments become reasonable. The enclosed CD has a large number of such 
exercises. Finally, since the CD contains the JFLAP implementations of many of the examples in the 
book, there is an opportunity for a dynamic classroom presentation of these examples. 


JFLAP is pretty much self explanatory and little effort is needed in learning to use it. I strongly 
recommend its use to both students and instructors. 


Answers 


Solutions and Hints for Selected Exercises 


Chapter 1 


Section 1.1 


5. To prove that two sets are equal, we must show that an element is in the first set if and only if it is 
in the second. Suppose x € 51 U 52.. Then” É 5, U S2. which means that x cannot be in Sı or in 


S», that is X © SiN Sa, Conversely, if“ € S1 N S2, then x is not in S, and x is not in S», that is, 
re Sy LI So.. 


6. This can be proven by an induction on the number of sets. Let Z = S4 US,...US,. Then S} U S5...U 
Sp U Sha = Z U Spa: By the standard De Morgan's law, 


FA KJ Sn+1 — Z N Sn. 
With the inductive assumption, the relation is true for up to n sets, that is, 


Z= Pe Ree a 


Therefore, 


2 Seas = Sy ae BSa ASi 


completing the inductive step. 


8. Suppose S, = S). Then $1 N S2 = S1 N S2 = S1 N Sı = Ø and the entire expression is the empty 
set. Suppose now that S, # S, and that there is an element x in S, but not in S>. Then? © S 2 so 
that 51 So # © The complete expression can then not be equal to the empty set. 


13. Ifx is inS, andx is in S), then x is not in(51 U S2) — 52, Because of this, a necessary and 
sufficient condition is that the two sets be disjoint. 


17. (c) Since 


n! _ nn- L Z 


n® n n nn 


is the product of factors less than or equal to one. Therefore, n! = O(n”). 


30. An argument by contradiction works. Suppose that 2 — V2 were rational. Then 


gives 


- 2%m—n 


m 
contradicting the fact that V 2 is not rational. 


33. By induction. Suppose that every integer less than n can be written as a product of primes. If n 
is a prime, there is nothing to prove; if not, it can be written as the product 


n = nny, 


where both factors are less than n. By the inductive assumption, they both can be written as the 
product of primes, and so can n. 


Section 1.2 


R ? . 
2. Many string identities can be proven by induction. Suppose that \“") =? ul for allu € £ and 
all v of length n. Take now a string of length n + 1, say w = va. Then 


x a 
(uw) = (uva) 
= a(uv)”, by the definition of the reverse 


= avu", by the inductive assumption 
wy se? 


L 
By induction then, the result holds for all strings. 
4. Since abaabaaabaa can be decomposed into strings ab, aa, baa, ab, aa, each of which is in L, the 


string is in L“. Similarly, baaaaabaa is in L”. However, there is no possible decomposition for 
baaaaabaaaab, so this string is not in L“. The strings aaaabaaaa and baaaaabaa are in L^. 


5. L= { A, a, b, ab, ba} U {w -= fa, b\* : u| > 3}. 
11. (d) We first generate three a's, then add an arbitrary number of a’s and b’s anywhere. 
S — AaAaAada, 
A — aA |bA| À. 


The first production generates three a’s. The second can generate any number of a’s and b’s in 


any position. This shows that the grammar can generate any string w € {a, b}” as long as n; (w) 
2D: 


12. 


S > aA > abS = abaA > ababS 


from which we see that 
L(G) = {(ab)" n> Of. 
13. L = Ø, since no terminal string can be derived with these productions. 


14. (a) Generate an equal number of a’s and b’s, then one or more b’s as needed. 


S — AB, 
A — aAbjA, 
B — bB\b. 


(d) The answer is easier to see if you notice that 
ee fang 7m > 0} 3 
This leads to the easy solution 


S — aaaAd, 


A — aAbjA. 


15. (b) The problem is simplified if you break it into two cases, |w| mod 3 = 1 and |w| mod 3 = 2. 
The first is covered by 


Sı — aaa Sıla, 
the second by 
S — aaasSzjaa. 
The two can be combined into a single grammar by 
EA 


18. (a) We can use the trick and results of Example 1.13. Let L} be the language in Example 1.13 
and modify that grammar so that the start symbol is S}. Consider then a string w € L. If this 
string starts with ana, then it has the form w = aw, where w, € L4. This situation can be 
taken care of by S — aS}. If it starts with a b, it can be derived by S — SS. 


23. The first grammar can derive 5 > 55 => aa, but the second grammar cannot derive this string. 


Section 1.3 


1. 
integer — sign magnitude 
sign — +|-|A 
magnitude — digit | digit magnitude 
digit — 0|1|2|3|4|5|6|7|8|9 


This can be considered an ideal version of C, as it puts no limit on the length of an integer. Most 
real compilers, though, place a limit on the number of digits. 


8. The automaton has to remember the input for one time period so that it can be reproduced for 
output later. Remembering can be done by labeling the state with the appropriate information. The 
label ofthe state is then produced as output later. 


11. We can remember input by labeling the states mnemonically. When a set of three bits is done, 
we produce output and return to the beginning to process the next three bits. The following 
solution is partial, but the completion should be obvious. 


12. In this case, the transducer must remember the two preceding input symbols and make transitions 
so that the needed information is kept track of. 


1/0 


Chapter 2 


Section 2.1 


2. (c) Break it into three cases each with an accepting state: no a’s, one a, two a’s, three a’s. A 
fourth a will then send the dfa into a non-accepting trap state. A solution: 


6-§-5-6-5 


5. (a) The first six symbols are checked. If they are not correct, the string is rejected. If the prefix 
is correct, we keep track of the last two symbols read, putting the dfa in an accepting state if 
the suffix is bb. 


7. (a) Use states labeled with |w| mod 3. The solution then is quite simple. 
a b 


(d) For this we use nine states, with the first part of each labeled n, (w) mod 3, the second 
part, n, (w) mod 3. The transitions and the final states are then simple to figure out. 


9. (a) Count consecutive zeros to get the main part of the dfa. 
D -OEP 


Then put in additional transitions to keep track of consecutive zeros and to trap unacceptable 
strings. Also provide for accepting à and 0. 


(d) Here we need to remember all combinations of three bits. This requires 8 states plus some 
start-up. The solution is a little long but not hard. A partial sketch of the solution is below. 


13. The easiest way to solve this problem is to construct a dfa for L = {a” :n = 4}, and then 
complement the solution. 


14. Label vertices with two numbers, the first |wimod 3, the second |w|mod 5. Then the states labeled 
03, 20, etc. are made final states. 


23. (a) By contradiction. Suppose Gj, has no cycles in any path from the initial state to any final 


state. Then every walk has a finite number of steps, and so every accepted string has to be 
of finite length. But this implies that the language is finite. 


(b) Also by contradiction. Assume that GM has some cycle in a path from the initial state to 
some accepting state. We can then use the cycle to generate an arbitrarily long walk labeled 
with an accepted string. But a finite language cannot contain arbitrarily long strings. 


25. There are many different solutions. Here is one of them. 


A, b 


a, b 


Section 2.2 


3. The complement of the language in Figure 2.8 is {a” :n is odd. n # 3} U {A}. A dfa for this 
language is 


a 


Note that you cannot just complement the final state set in Figure 2.8. 


5, ô* (qo, a) = {qo, 1, 92} .5* (q1, A) = {40, 1. 42} 


8. A four-state solution is trivial, but it takes a little experimenting to get a three-state one. Here is 
one answer: 


9. No. The string abc has three different symbols and there is no way this can be accepted with fewer 
than three states. 


16. This is the kind of problem in which you just have to try different ways. Probably most of your 
tries will not work. Here is one that does. 


a 


i O 
_— > 


À 
18. Introduce a single starting state pọ. Then add a transition 
ô (po, A) = Qo. 


Next, remove starting state status from Qp. It is straightforward to see that the new nfa is 
equivalent to the original one. 


21. Introduce a non-accepting trap state and make all undefined transitions to this new state. 
Solution: 


é 


Section 2.3 
2. Just follow the procedure nfa-to-dfa. This gives the dfa 
1 


7. Introduce a new final state py and for every q € F add the transitions 
ô (q, à) = {pf} - 


Then make pp the only final state. It is a simple matter then to argue that if Oo (qo W) EF 
originally, then 6° (qo w) = {pp} after the modification, so both the original and the modified 
nfa's are equivalent. 


Since this construction requires A-transitions, it cannot be made for dfa's. Generally, it is 
impossible to have only one final state in a dfa, as can be seen by constructing a dfa that accepts 
{À a}. 


8. Getting an answer requires some thought. One solution is 
a, b 


11. Suppose that L = {w}, W2,...Wm} - Then the nfa 


accepts wi Q 
meee eee > 


accepts L, so the language is regular. 


14. This is not easy to see. The trick is to use a dfa for L and modify it so that it remembers if it has 
read an even or an odd number of symbols. 


This can be done by doubling the number of states and adding O or E to the labels. For example, 
if part of the dfa is 


With a few examples you should be able to convince yourself that if the original dfa accepts 


A 1A703a4, the new automaton will accept Aa,Aa,4..., and therefore even (L). 


15. Suppose we have a dfa that accepts L. We then 
(a) Identify all states 2 that can be reached from do, reading any two-symbol prefix v, that is, 
Y ={qEQ: E (go,v) =, |v| = 2}. 
(b) Introduce a new initial state py and add 
ô (po, A) = Q". 


It should not be hard to see that the new nfa accepts chop2 (L). Although the construction is 
plausible, a complete answer requires a proof of the last statement. 


Section 2.4 
2. (c) 


6-6-6004 


This is minimal for the following reason. q3 É F and q4 € F, so q3 and q; are distinguishable. 
Next, 0° (qo, a) É F and 0° (q4, a) € F, so q and q4 are distinguishable. Similarly, 0° (q,, aa) € F 
and ô“ (q3, aa) € F, so q, and q; are distinguishable. Continuing this way, we see that all states 
are distinguishable and therefore the dfa is minimal. 


4. First, remove the inaccessible states q) andq,. Then use the procedure mark to find the 


indistinguishable pairs (qo, q1) and (q3,q5). This then gives the minimal dfa. 
0 0,1 


SEF 


6. By contradiction. Assume that M. iş not minimal. Then we can construct a smaller dfa AV that 


accepts L.n M, complement the final state set to give a dfa for L. But this dfa is smaller than M, 
contradicting the assumption that M is minimal. 


10. By contradiction. Assume thatq, andq, are indistinguishable. Sinceg, andq, are 
indistinguishable and indistinguishability is an equivalence relation (Exercise 7), g, and q, must 
be indistinguishable. 


Chapter 3 


Section 3.1 
2. Yes, because ((0 + 1)(0 + 1)")" denotes any string of 0’s and 1’s. 


6. (a) Separate into cases m = 0, 1, 2, 3. Generate 4 or more a’s, followed by the requisite number of 
b’s. Solution: aaaaa* (ù + b + bb + bbb). 


(c) The complement of the language in 6(a) is harder to find. A string is not in L if it is of the 


form a”b”, with either n < 4 or m > 3, but this does not completely describe L. We must also 
consider the case where a b is followed by ana. 


(A+a+aa+ aaa) b* + a*bbbbb* + (a+ b)“ ba(a +b)". 


10. Split into three cases: (1) m = 1, n = 3, (11) n > 2, m È> 2, and (iii) n = 1, m= 3. Each case has a 
straightforward solution. 


13. Enumerate all cases with |v| = 2 to get 
aa(a+b)* aa+ab(a+ b)* ab+ba(a+b)* ba + bb (a + b)* bb. 
16. (c) You just have to get each symbol once. The term 
(at+b+c)'a(atb+c) b(at+b+e)* c(atb+e)’ 


will do this, but is not enough since the a will precede the b, etc. For the complete solution you 
must generate all permutations of the three symbols, giving six terms that can be added. The 
answer, although quite long, is conceptually not hard. 


17. (c) Create two 0’s, interspersed with 1’s, then repeat. But don't forget the case when there are 
no 0’s at all. Solution: (1°01°01°)* + 1°. 


18. (a) Create all strings of length three and repeat. A short solution is ((a + b + c)X(a +b +c)(a +b 
+o). 


20. (c) The statement 
(rit ra) = (rry 


is true. By the given rules (r; + r2)" denotes the language (L (r4) UL (r2))*, that is, the set of all 
strings of arbitrary concatenations of elements of L (r4) and L (r2). But (r, 7) denotes ((L(r,))" 
(L(r>))')’, which is the same set. 


23. The expression for an infinite language must involve at least one starred subexpression, 
otherwise it can only denote finite strings. If there 1s one starred subexpression that denotes a 


non-empty string, then this string can be repeated as often as desired and therefore denote 
arbitrarily long strings. 


25. A closed contour will be generated by an expression r if and only ifn; (r)= n, (r) and n, (r) = 


nq (r). 


27. Notice several things. The bit string must be at least 6 bits long. If it is longer than 6 bits, its 
value is at least 64, so anything will do. If it is exactly 6 bits, then either the second bit from the 
left (16) or the third bit from the left (8) must be 1. If you see this, then the solution 


(111+ 110+ 101)(0 + 1)(0+ 1)(041) 
+ 1(0+1)(04+1)(04+1)(04+- 1)(04+ 1)(04+1)(04 1)* 


readily suggests itself. 


Section 3.2 


3. This can be solved from first principles without going through the regular expression to nfa 
construction. The latter will, of course, work but gives a more complicated answer. Solution: 


a 


b 


Then use the nfa to dfa algorithm in a routine manner. 


8. Removing the middle vertex gives 


(a + b) ab 
a bb + ab 


By Equation (3.1), the language accepted then is L(r), where 
r = a* (a+ bjab(ab+ bb + aa” (a+ bjab)*. 


10. (b) First, we have to modify the nfa so that it satisfies the conditions imposed by the 


construction in Theorem 3.2, one of which is % ¢ F. This is easily done. 


OM & 


Then remove state 3. 


ab ba 


à G) a) 
aa +b 


Next, remove state 4. 


(ab) + (aa + b)\( ba)” bb 


oe 
The regular expression then is r = (ab + (aa + b)(ba)* bb’)”. 


16. (a) This is a hard problem until you see the trick. Start with a dfa with states go, q),..., and 
introduce a “parallel” automaton with states %0:%1; =- Then arrange matters so that the 
spurious symbol nondeterministically transfers from any state of the original automaton to 
the corresponding state in the parallel part. For example, if part of the original dfa looks 
like 


then the dfa with its parallel will be an nfa whose corresponding part is 


-@O-O--@- 


It is not hard to make the argument that the original dfa accepts L if and only if the constructed 
nfa accepts insert (L). 


Section 3.3 
4. Right linear grammar: 


5 — aaA 
A > aA|B 
B — bbbC 
C — bC|A 
Left linear grammar: 
S — Abbb 
A— ADB 
B — Caa 
C — Cala 


8. We can show by induction that if w is a sentential form derived with G, then w? can be derived in 
the same number of steps by G. 
Because w is created with left linear derivations, it must have the form w = Aw,, with A € V and 
w; € T”. By the inductive assumption 
w? = wf can be derived via G. If we now apply a — Bv, then 
w => Bow. 
But G contains the rule A > vĒB, so we can make the derivation 


we _, wy? B 


= (Bow T 
completing the inductive step. 


11. Split this into two cases: (1) n and m are both even and (11) n and m are both odd. The solution 
then falls out easily, with 


Ss — aaS|A 
A — bbA|A 


taking care of case (i). 


13. (a) First construct a dfa for L. This is straightforward and gives transitions such as 


ô (qo,@) = q1, ô (qo; b) = Q2, 
ô(q1,a) = qo, ô (q1, b) = qa. 
fa) (qo,a ) = q3. fa) ( q2. b p qo: 


ô (q3,a) = q2,ô (q3, b) = qı; 
with go the initial and final state. Then the construction of Theorem 3.4 gives the answer 


qo — aqı |bq2] A, 
qı — bqz|aqo, 
is. = aq3|bqo, 


q3 — aq2|bqy. 


17. Obviously, L(G,) is regular, as is L(G). We can show that their union is also regular by 
constructing the following dfa. 


nfa for L( G,) 


nfa for L(G) 


The condition that V} and V, should be disjoint is essential so that the two nfa's are distinct. 


Chapter 4 


Section 4.1 
2. (a) The construction is straightforward, but tedious. A dfa for L ((a + b)a’) is given by 
d(qg.a)=q1. S(qo.b)=q. S(q.a)=q. Fip = 
with q, a trap state and final state q}. A dfa for L(baa’) is given by 


d6(po.a) = pt. ô (po. b) = p1,4(p1,a) = p2, 


ô (p1.b) = pr, ô (po.a) = po. Ô (po, b) = Pt 


with final state p>. From this we find 


ô ((gdo,Po),@) = (q1, Pt). ô ((qo:. po), b) = (q1; 71), 


ó (( Gi. pi).@) = (qi, p2).0((q1, p2),a@) = (qi. po). 


etc. When we complete this construction, we see that the only final state is (q4, p2) and that 
L((a+b)a*) NL (baa*) = L(baa*) 


7. Notice that 
nor (Lı, L2) = L1 U L2. 
The result then follows from closure under intersection and complementation. 
12. The answer is yes. It can be obtained by starting from the set identity 
Lə = ((L1 U L2) N Lı) U(Lin Le). 


The key observation is that since L, is finite, L} L, is finite and therefore regular for all L}. 
The rest then follows easily from the known closures under union and complementation. 


14. By closure under reversal, LĒ is regular. The result then follows from closure under 
concatenation. 


16. Use L; = X*. Then, for any Ly, L; U L, = =", which is regular. The given statement would then 
imply that any L, is regular. 


18. We can use the following construction. Find all states P such that there is a path from the initial 
vertex to some element of P, and from that element to a final state. Then make every element of P 
a final state. 


26. Suppose G, = (V,7,S),P,) and G, = (V>,T,S>,P5). Without loss of generality, we can assume that 
V; and V, are disjoint. Combine the two grammars and 


(a) Make S the new start symbol and add productions S — S}|S5. 
(b) In P}, replace every production of the form A —> x, with A € V; and x € T“, by A > xS). 


(c) In P}, replace every production of the form A — x, with A € V}, and x € T“, by A > xS}, S| 


>À. 


Section 4.2 
1. Since by Example 4.1 L4 — L; is regular, there exists a membership algorithm for it. 


2. If Lı & L2, then L; UL, = L}. Since L, UL, is regular and we have an algorithm for set equality, 


we also have an algorithm for set inclusion. 
5. From the dfa for L, construct the dfa for LÊ, using the construction suggested in Theorem 4.2. Then 
use the equality algorithm in Theorem 4.7. 
12. Here you need a little trick. If L contains no even-length strings, then 


En ((aa + ab + ba + bb)" ) = g. 


The left side is regular, so we can use Theorem 4.6. 


Section 4.3 


2. For the dfa for L to process the middle string v requires a walk in the transition graph of length |v]. 
If this is longer than the number of states in the dfa, there must be a cycle labeled y in this walk. 
But clearly this cycle can be repeated as often as desired without changing the acceptability of a 
string. 


4. (a) Given m, pick w = a’"b’a?". The string y must then be a‘ and the pumped strings will be 


7 (i—l)kim „2m 
Wi; =a n+(i—1) b™ a å 


If we take i > 2, then m + (i — 1) k> m, and then w;is not in L. 
(e) It does not seem easy to apply the pumping lemma directly, so we proceed indirectly. 


Suppose that were regular. Then by the closure of regular languages under 


complementation, L would also be regular. But £ = {w : na (w) = ns (w)}, which, as is 
easily shown, is not regular. By contradiction, L is not regular. 


5. (a) Take p to be the smallest prime number greater or equal to m and choose w = aP. Now y is a 
string of a’s of length k, so that 


wi = i od 


If we take i— 1 =p, then p + (i — 1) k=p (k + 1) is composite and w,,,; is not in the language. 


14. The proposition is false. As a counterexample, take ZL, = {a"b” : n <m} and L, = {a"b” : n > 
m}, both of which are non-regular. But L, U L, = L(a*b*), which is regular. 


15. (a) The language is regular. This is most easily seen by splitting the problem into cases such as / 
= 0, k=0, n> 5, for which one can easily construct regular expressions. 


(b) This language is not regular. If we choose w = aaaaaab"a™, our opponent has several 
choices. Ify consists of only a’s, we use i = 0 to violate the condition n > 5. If the opponent 
chooses y as consisting of b’s, we can then violate the condition k < l. 


17. L is regular. We see this from = Lı N L7 and the known closures for regular languages. 
19. (a) The language is regular, since any string that has two consecutive symbols that are the same is 


in the language. A regular expression for L is (a + b)(a + b)* (aa + bb)(a+b)(a +b)’. 


(b) The language is not regular. Take w = (ab)” aa (ba). The adversary now has several 
choices, such as y = (ab)* or y = (ab) a. In the first case 


¢_2\m—k ; m 
wo = (ab) aa (ba) . 


Since the only possible identification is ww? = (ab)!aa(ba)!, the prefix u is shorter than the 
suffix v, and wọ is not in Z. With the second choice, the length of wọ is odd, so it cannot be in LZ 


either. 


21. Take L; = abi, i = 0,1,.... For each i, L; is finite and therefore regular, but the union of all the 
languages is the nonregular language L = {a”b" : n > 0}. 


25. A rectangle is described by u”r”d”l”. This is not regular, by a straightforward application of the 
pumping lemma. 


Chapter 5 


Section 5.1 


4. It is quite obvious that any string generated by this grammar has the same number of a’s as b’s. To 
show that the prefix condition n, (v) = np (v) holds, we carry out an induction on the length of the 


derivation. Suppose that for every sentential form derived from S in n steps this condition holds. 
To get a sentential form in n + 1 steps, we can apply S — à or S — SS. Since neither of these 
changes the number ofa’s and b’s or the location of those already there, the prefix condition 
continues to hold. Alternatively, we apply S — aSb. This adds an extra a and an extra b, but since 
the added a is to the left of the added b, the prefix condition will still be satisfied. Thus, if the 
prefix condition holds after n steps, it will still hold after n+1 steps. Obviously, the prefix 
condition holds after one step, so we have a basis and the induction succeeds. 


7. (a) First, solve the case n = m+3. Then add more b’s. This can be done by 


S — aaaA, 
A — aAb|B, 
B — Bbjd. 


But this is incomplete since it creates at least three a’s. To take care of the cases n = 0, 1, 2, we 
add 


S — A|aAl aad. 
(d) This has an unexpectedly simple solution 


S — aSbb |a.Sbbb| Ai 


These productions nondeterministically produce either bb or bbb for each generated a. 


8. (a) For the first case n = m and k is arbitrary. This can be achieved by 


Sı — AC, 
A — aAbjA, 
C — Ceļà. 


In the second case, n is arbitrary and m < k. Here we use 


S2 > BD, 
B = aB\d, 
D > bDelE, 
E > Eelà. 


Finally, we start productions with S' — S}|S>. 


(e) Split the problem into two cases: n = ktm and m = k+n. The firstcase is solved by 


S — aSc |S] A, 


Sı — asıbļ|à. 


13. (a) If S derives L, then S, — SS derives L?. 


16. It is normally not possible to use a grammar for L directly to get a grammar for D, so we need 
another, hopefully recursive description for L. This is a little hard to see here. One obvious 


subset of L contains the strings of odd length, but this is not all. 


Suppose we have an even-length string that is not of the form ww*. Working from the center to 
the left and to the right simultaneously, compare corresponding symbols. While some part around 
the center can be of the form wwf, at some point we get ana on the left and ab in the 
corresponding place on the right, or vice versa. The string must therefore be of the form 
uaww*by or ubww*av with |u| = |v). Once we see this, we can then construct grammars for these 
types of strings. One solution is 


S — ASA|B, 

A — ajb, 

B — bCalaCb, 
C — aCa|bCb| à. 


The first two productions generate the u and v, the third the two disagreeing symbols, and the 
last the innermost palindrome. 


20. The only possible derivations start with 


S > aaB => aada > aabBba => aabAaba. 
But this sentential form has the suffix aba so it cannot possibly lead to the sentence aabbabba. 


23, E > E + E |E.E| E* \(E)|A||a\b. 


Section 5.2 
2. A solution is 

S—aA,A— aAB\|b, B —b. 
6. There are two leftmost derivations for w = aab. 


S aaB => aab, 


S => AB=> AaB = aaB = aab. 


v 


9. From the dfa for a regular language we can get a regular grammar by the method of Theorem 3.4. 
The grammar is an s-grammar except for gy — A. But this rule does not create any ambiguity. 


Since the dfa never has a choice, there is never any choice in the production that can be applied. 


14. Ambiguity of the grammar is obvious from the derivations 


S => aSb=> ab 


S => SS =>.abS => ab. 
An equivalent unambiguous grammar is 


S — AA 
A — aAb |ab| AA. 


It is not easy to see that this grammar is unambiguous. To make it plausible, consider the two 
typical situations, w = aabb, which can only be derived by starting with A — aAb, and w = abab, 
which can only be derived by starting with A — AA. More complicated strings are built from 
these two situations, so they can be parsed only in one way. 


20. Solution: 


S — aAlaAA, 
A — bAb|bb. 


Chapter 6 


Section 6.1 


3. Use the rule in Theorem 6.1 to substitute for B in the first grammar. Then B becomes useless and 
the associated productions can be removed. By Theorems 6.1 and 6.2 the two grammars are 


equivalent. 


8. The only nullable variable is A, so removing A-productions gives 


S—aA |a| aBB. 


A = aaAlaa, 
B = bC|bbC, 
C — B. 


C — B is the only unit-production and removing it results in 


S — aAļ|a|aBB, 
A my aaAlaa, 
Bs bC'|bbC, 
C — bC|bbC. 
Finally, B and C are useless, so we get 
S — aAla, 
A — aaAlaa. 


The language generated by this grammar is L ((aa)" a). 


13. L(G) = L(G) — {A}. 


15. An example is 
S — aA, 
A — BB, 
B = aBb)). 
When we remove A-productions we get 
S — aAla, 
A — BBIB, 


B — aBblab. 
17. This is obvious since the removal ofuseless productions never adds anything to the grammar. 


22. The grammar S — aA; A — a does not have any useless productions, any unit productions, or 
any A-productions. But it is not minimal since S — aa is an equivalent grammar. 


Section 6.2 


5. First we must eliminate A-productions. This gives 


S — AB|B|aB, 
A — aab, 
B — bbA|bb. 


This has introduced a unit-production, which is not acceptable in the construction of Theorem 
6.6. Removal of this unit-production is easy. 


S — AB |bbA|aB|bb, 
A — aab, 
B — bbA|bb. 


We can now apply the construction and get 


S — AB |V V, A| Va B\V;Vp, 
A — TNV, 
B — VbV A| VeV, 


and 
S — AB |V-A| VaB|VV, 


A — VaVs, 

B — V.A|VoVo. 
Ve — ViVa. 

Va — VaVb, 

Va > a, 

Vb — b. 


8. Consider the general form for a production in a linear grammar 
A = a10)...An Bbib2...bm. 

Introduce a new variable V, with the productions 

Vi ag...dy Bbiba...bm 
and 

A — aV. 

Continue this process, introducing V, and 

Vo —> a3...dyn Bbiba...bm 


and so on, until no terminals remain on the left. Then use a similar process to remove terminals 
on the right. 


9. This normal form can be reached easily from CNF. Productions of the form A — BC are permitted 


since a = À is possible. For A — a, create new variables V,, V) and productions A — aV,V3, V, 
=> À, V» >À. 


12. Solutions: 5 ~> 4V» |a S| aVa S, Va — a, Ve = b. 


15. Only A — bABC is not in the required form, so we introduce A — bAV and V — BC. The latter 
is not in correct form, but after substituting for B, we have 


S — aSA, 
A — bAV, 
V — bC, 
B= b, 

C — aBC. 


Section 6.3 


2. Since aab is a prefix of the string in Example 6.11, we can use the V; computed there. Since S € 
V33, the string aab is in the language generated by the grammar and can therefore be parsed. 


For parsing, we determine the productions that were used in justifying S € V43: 


S € Vig because S — AB, with A € Vu and B E Vas, 
A € Vu because A — a, 

B € Vo3 because B — AB, with A € Vaz, B € V33, 

A € Va because A — a, 

B € V33 because B — b. 


This shows all the productions needed to justify membership; these can then be used in the 
parsing 


S> AB=>aB => aAB=>aaB > aab. 


Chapter 7 


Section 7.1 


2. The key to the argument is the switch from gp to g,, which is done nondeterministically and need 
not happen in the middle of the string. However, if a switch is made at some other point or if the 


input is notof the form ww*, an accepting configuration cannot be reached. Suppose the content of 
the stack at the time of the switch is x,x>...x,z. To accept a string we must get to the configuration 


(qı, A, zZ). By examining the transition function, we see that we can get to this configuration only if 


at this point the unread part of the input is x,x5...x,, that is, if the original input is of the form ww* 
and the switch was made exactly in the middle of the input string. 


4. (a) The solution is obtained by letting each a put two markers on the stack, while each b consumes 


one. Solution: 


SAAS {en 
d(go,a,z) = {(q1, 11z)}, 
d(qo,a,1) = {(q1,111)}, 
5 (qy.b,1) = {(qy.A)}. 
6(q1,A,2) = {(as,2)}- 


(f) Here we use nondeterminism to generate one, two, or three tokens by 
ô (qo,a,z) = {(q1, 12), (q1, 112), (q1, 1112)} 
and 
ô (qo.a, 1) = { (q1, 11). (q1,111),(qi,1111)}. 
The rest of the solution is then essentially the same as 4(a). 


9. This is a pda that makes no use of the stack, so that is, in effect, a finite accepter. The state 
transitions can then be taken directly from the pda, to give 


Ô (qo, a) = 4. 
ô (q0, b) = qo. 
ó (gi a) = Gg, 
ô (q1, b) = qo. 


11. Trace through the process, taking one path at a time. The transition from gp to q) can be made 


with a single a. The alternative path requires one a, followed by one or more b’s, terminated by 
ana. These are the only choices. The pda therefore accepts the language 


L = {a} UL (abb*a). 


14. Here we are not allowed enough states to track the switch froma’s to b’s and back. To 
overcome this, we put a symbol in the stack that remembers where in the sequence we are. For 
example, a solution is 


ô (qoa, z) = {(go.1)} 
ô (q0, 1) = {(q0,1)}, 
5 (ao L= {(qo,2)}, 
d(qo,a,2) = { (q0, 2)} 
Ò (q0, À, 2) = { (qaf, 2)} 


We have only two states, the initial state qq and the accepting state gy What would normally be 
tracked by different states 1s now tracked by the symbol in the stack. 


16. Here we use internal states to remember symbols to be put on the stack. For example, 


ô | di. a, b — { ( qj: cde )} 
is replaced by 


ò (qi, a,b) = { (djc de)}, 
Ô (qjc,A,d) = {(q;.cd)}. 


Since ô can have only a finite number of elements and each can only add a finite amount of 
information to the stack, this construction can be carried out for any pda. 


Section 7.2 


3. You can follow the construction of Theorem 7.1 or you can notice that the language is {a” +b"! : 
n> 0}. With the latter observation we get a solution 


d(qo,a,z) = {(q1.2)}. 
6(qi,a,z) = {(qo.z)}., 
6(q2,a,2z) = {(q2,11z)} 
ô (qo,a,1) = {(q2.111)}, 
ô (q2,b,1) = {(q3,1)} 

ô (q3.6,1) = {(q3.A)}, 
6(q3,A,z) = {(az,2)} 


where qo is the initial state and qp is the final state. 


4. First convert the grammar into Greibach normal form, giving S — aSSS; S — aB; B — b. Then 
follow the construction of Theorem 7.1. 


Stag Ay 2) = 


{ 
ô (q,a, S) {(q1, SSS), (q, B)}, 
ô (q1, b, B) = {(qm,A)}, 
{ 


6(q,A,z) = 


7. From Theorem 7.2, given any npda, we can construct an equivalent context-free grammar. From 
that grammar we can then construct an equivalent three-state npda, using Theorem 7.1. Because of 
the transitivity of equivalence, the original and the final npda's are also equivalent. 


9. We first obtain a grammar in Greibach normal form for L, for example, S — aSB\b, B — b. Next, 
we apply the construction in Theorem 7.1 to get an npda with three states, qo, q1, qs- The state q; 


can be eliminated if we use a special stack symbol z, to mark it. A complete solution is 


ô (qo. = {(qo, Sz1)}, 
ô (qo. a, ic a is SB)}., 
ô (go.b,S) = {(q0.A)}. 
ô (qo, aa Mh, 

= {(ar-A)}- 


Ò (go, - (qf. 


11. There must be at least one a to get started. After that, ô (qo, a, A) = {(qo, A)} simply reads a’s 


without changing the stack. Finally, when the first b is encountered, the pda goes into state ql, 
from which it can only make a )-transition to the final state. Therefore, a string will be accepted 
if and only if it consists of one or more a’s, followed by a single b. 


Section 7.3 


4. At first glance, this may seem to be a nondeterministic language, since the prefix a calls for two 
different types of suffixes. Nevertheless, the language is deterministic, as we can construct a 
dpda. This dpda goes into a final state when the first input symbol is an a. If more symbols 


follow, it goes out of this state and then accepts a”b”. Complete solution: 


5 (q0,a,2) = {(as,12)}, 

ô (q3,a, 1 {(qi,11)}. 

6(qi,a,1) = {(m,11)}., 
1 {(q1,A)}, 

{ 


(q2, z)}, 


Ò (qi, b.1) 


ô { qi, À.z <= 
where F= {q>, 43}. 


9. The solution is straightforward. Put a’s and b's on the stack. The c signals the switch from saving 
to matching, so everything can be done deterministically. 


11. There are two states, the initial, non-accepting state gp and the final state q}. The pda will be in 
state g, unless a z is on top of the stack. When this happens, the pda will switch states to qo. The 
rest is essentially the same as Example 7.3. Thus we have ô (qo, a, z) = {(q), 9,)}, ô (q1, a, 0) = 
{(q1, 00)}, etc. with ô (q1, A, Z) = {(qo, z)}. When you write this all out, you will see that the pda 
is deterministic. 


15. This is obvious since every regular language can be accepted by a dfa and such a dfa is a dpda 
with an unused stack. 


16. The basic idea here is to combine a dpda with a dfa along the lines of the construction in 
Theorem 4.1, with the stack handled as it is for Z,. It should not be too hard to see that the result 


is a dpda. 


Section 7.4 


2. Consider the strings aabb and aabbbbaa. In the first case, the derivation must start with S > aSb, 
while in the second S = SS is the necessary first step. But if we see only the first four symbols, 
we can-not decide which case applies. The grammar is therefore not in LL(4). Since similar 
examples can be made for arbitrarily long strings, the grammar is not LL(k) for any k. 


4. Look at the first three symbols. If they are aaa, aab, or aba, then the string can only be in L(a ba). 
If the first three symbols are abb, then any parsable string must be in L(abbb’). For each case, we 
can find an LL grammar and the two can be combined in an obvious fashion. A solution is 


75 BB. 
Sı — aSj|ba, 
So —> abbB, 
B —> bB\). 


Looking at the first three symbols tells us if S = Sı or S = S2 is necessary. The grammar is 
therefore LL(3). 


7. For a deterministic CFL there exists a dpda. When this dpda is converted into a grammar, the 
grammar is unambiguous. 
9. (a) 
S — aSe|Si| À, 


Sy =R bS,c| x: 


This is almost an s-grammar. As long as the currently scanned symbol is a, we must apply S —> 
aSc; if itis b, we must use S — S4; ifit is c, we can only use S' — à. The grammar is LL(1). 


Chapter 8 


Section 8.1 


3. Take w = a”b”b”a”a”b”. The adversary now has several choices that have to be considered. If, 
for example, v = a‘ and y = a’, with v and y located in the prefix a”, then 


m—k—-—I_pmyp y mpm 
wo = aE- ym gQMgMB™ | 


which is not in L. There are a number of other possible choices, but in all cases the string can be 
pumped out of the language. 


7. (a) Use the pumping lemma. Given m, pick w = a b™. The only choice of v and y that needs any 
serious examination is v = a* and y = b!, with k and / nonzero. Suppose that / = 1. Then choose i = 
2, so that w, has m? + k a’s and m+ 1 b’s. But 


(m+ 1)? = m? +2m + 1 
> m? + k, 
so w, is not in the language. Similar arguments hold also for/ > 1. Therefore, the language 
cannot be context-free. 
(f) Given m, choose w = ab"*!c""*?, which is easily pumped out of the language. 
8. (b) The language is not context-free. Use the pumping lemma with w = a”b”a”b™ and examine 


various choices of v and y. 


10. Perhaps surprisingly, this language is context-free. Construct an npda that counts to some value k 
(by putting k tokens on the stack) and remembers the kth symbol. It then examines the Ath symbol 
in w2. If this does not match the remembered symbol, the string is accepted. If w € L, there must 
be some k for which this happens. The npda chooses the k nondeterministically. 


12. Use the pumping lemma for linear languages. With a given m, choose w = a"b?"a". Now v and y 
are entirely made of a’s, so wis easily pumped out of the language. 


15. The language is not linear. With the pumping lemma, use 
w= {...(a@)...)+ (... (a) ...) 


where (...( and )...) stand for m left or right parentheses, respectively. If |u| > 1, we can easily 
pump so that for some prefix v, n(v) < n) (v) which results in an improper expression. Similar 
arguments hold for other decompositions. 


20. Use w = a?4, where p and q are primes such that p > m and q > m. If |vy| = k, then 
lwi+i| = pq + ik. 
If we choose i = pq, then 


Wi+1 = qPa(it+k) 


which is not in the language. 


Section 8.2 


1. The complement is context-free. The complement involves two cases: n, (w) £ np (w) and n, (w) 


+n, (w). These in turn can be broken into n, (w) > np (w), na (w) > ne (w), Ng (w) < n, (w), and 
Na (w) <n, (w). Each of these is context-free as can be shown by construction of a CFG. The full 
language is then the union of these four cases and by closure under union is context-free. 


. Given a context-free grammar G, construct a context-free grammar G by replacing every 
production A — x byA —x*. We can then show by an induction on the number of steps in a 


derivation that if w is a sentential form for G, then w? is a sentential form for G. 
9. Given two linear grammars G; = (V;,7,S),P)) and G, = (V>,7,8,,P7) with ViN V2 = Ø, form the 
a en ae e olg ~w, o G) = L(G) 
combined grammar © = (Vi UV2,T,5.P1U P2US — S1|S2), Then G is linear and * ( ) MEg 
U L (G2) 
To show that linear languages are not closed under concatenation, take the linear language L = 
{a"b" : n> 1}. The language L, is not linear, as can be shown by an application of the pumping 
lemma. 


13. Let G) = (V,,7,S),P),) be a linear grammar for L, and let G, = (V2,T,S2,P2) be a left-linear 
grammar for L,. Construct a grammar G2 from G, by replacing every production of the form V > 


x, x € T“ with V — S,x. Combine grammars G} and G2, choosing S, as a start symbol. It can then 
be shown that in this grammar 


So > Syw > uw 
-_ 


if and only ifu € L} and w € L». 


15. The languages L, = {a”b”c”} and L, = {a”b”c™} are both unambiguous. But their intersection is 
never context-free. 


21. à € L(G) if and only if S is nullable. 


Chapter 9 


Section 9.1 


2. A three-state solution that scans the entire input is 


Ae A I EE A A 
ô (q,a) = (q1, b) = (q1,0, R), 


å (g H= (q@2.0, R), 


It is also possible to get a two-state solution by just examining the first symbol and ignoring the 
rest of the input, for example, 


ô (q0,a) = (q2,a, R). 
Notice that in a Turing machine it is not necessary to examine the entire input before accepting it. 


7. (a) 


ô (qo.a) = (qi.a,R). 


ô (q1,5) = (qo. 6, R), 
ô (q,a) = (qo.a,R), 
ô (q2,b) = (q3,b, R). 
(b) 
â (go,a) = ô ( qo: b)= ( M1; O.R), 
å (q0. 0O) = (œ, 0O, R), 
6(qi,a) = ô (q1, b) = (q0, O, R), 


10. The solution is conceptually simple, but tedious to write out in detail. The general scheme looks 
something like this: 
(i) Place a marker symbol c at each end of the string. 


(ii) Replace the two-symbol combination ca on the left by ac and the two-symbol combination 
ac on the right by ca. Repeat until the two c’s meet in the middle of the string. 


(iii) Remove one of the c’s and move the rest of the string to fill the gap. 
Obviously this is a long job, but it 1s typical of the cumbersome ways in which Turing machines 


often do simple things. 


12. We cannot just search in one direction since we don't know when to stop. We must proceed in a 
back-and-forth fashion, placing markers at the right and left boundaries of the searched region 
and moving the markers outward. 


19. If the final state set F contains more than one element, introduce a new final state gy and the 
transitions 


d(q,a) = ( qf. a, R) 


for allge FandaeT. 


Section 9.2 


3. (a) We can think of the machine as constituted of two main parts, an add-one machine that just 
adds one to the input, and a multiplier that multiplies two numbers. Schematically they are 
combined in a simple fashion. 


n(n +1) 


5. (c) First, split the input into two equal parts. This can be done as suggested in Exercise 10, 
Section 9.1. Then compare the two parts, symbol by corresponding symbol until a mismatch 
is found. 


8. A solution: 


ô (qoa) = (qia, R), 
d(qo.c) = (qo,6, R) for all cE E -— {a}, 


ô (q0, O) = (qj, O, R). 


The state gp is any state in which the searchright instruction may be applied. 


Section 9.3 


2. We have ignored the fact that a Turing machine, as defined so far, is deterministic, while a pda can 
be nondeterministic. Therefore, we cannot yet claim that Turing machines are more powerful than 
a pushdown automata. 


Chapter 10 


Section 10.1 


4. (a) The machine has a transition function 


Ò : O x E — ) i: >< {L, R S} 


with the restriction that for all transitions ô (q;, a) = (q;, b, L or R), the condition a = b musthold. 


(b) To simulate ô (q;, a) = (q;, b, L) witha +b of the standard machine, we introduce new 
transitions 6 (q;, a) = (qjz, b, S) and Ò (q;z, b) = (qj, b, L) for all ¢ € F, and so on. 

6. We introduce a pseudo-blank B. Whenever the original machine wants to write O, the new 
machine writes B. Then, for each ô (q;, O) = (q;, b, L) we add ô (q;, B) = (q;, b, L), and so on. Of 
course, the original transition ô (q; O) = (q;, b, L) must be retained to handle blanks that are 
originally on the tape. 

9. This does not limit the power of the machine. For each symbol a € F, we introduce a pseudo- 


symbol, say A. Whenever we need to preserve this a, we first write A, then return to the cell in 
question to replace A by a. 


11. We replace 
ô (qi. {a,b}) = (qj, c, R) 
by 


Ô (qi, d) = ( qj:¢; R) 


for all deT— {a, b}. 


Section 10.2 


2. For the formal definition use Fr =T xT x... x Pando: Q@*Tr >Q x Pr x{L, RF", where m is 
the number of read-write heads. One issue to consider is what happens when two read-write 


heads are on the same cell. The formal definition must provide for the resolution of possible 
conflicts. 


To simulate the original machine ( OM) by a standard Turing machine (SM), we let SM have m + 
1 tracks. On one track we will keep the tape contents of the OM, while the other m tracks are 
used to show the position of OM’s tape heads. 


tape content of OM 


position of tape head # 1 


position of tape head # 2 


SM will simulate each move of the OM by scanning and updating its active area. 


5. This exercise shows that a queue machine is equivalent to a standard Turing machine and that 
therefore a queue is a more powerful storage device than a stack. To simulate a standard TM by a 


queue machine, we can, for example, keep the right side of the OM in the front of the queue, the 
left side in the back. 


i read-write head 


Simulation 
by queue 


A right move is easy, as we just remove the front symbol in the queue and place something in the 
back. A left move, however, goes against the grain, so the queue contents have to be circulated 


several times to get everything in the right place. It helps to use additional markers Y and Z to 
denote boundaries. For example, to simulate 


0(qj,c) = ( qj: z,L) 
we carry out the following steps. 
(1) Remove c from the front and add zY to the back. 
(ii) Circulate contents to get bzY defexa. 


(iii) Add Z to the back, then circulate, discarding Y and Z as they come to the front. 


9. We need just two tapes, one that mirrors the tape of the OM, the second that stores the state of the 
OM. 


configuration of OM 


co nfiguratio n of SM 


f 
EC i rS 


SM needs only two states: an accepting and a nonaccepting state. 


Section 10.3 


3. (i) Start at the left of the input. Remember the symbol by putting the machine in the appropriate 
state. Then replace it with X. 


(ii) Move the read-write head to the right, stopping (nondeterministically) at the center of the 
input. 

(iii) Compare the symbol there with the remembered one. If they match, write Y in the cell. If 
they don't match, reject input. 


(iv) With the center of the input marked with Y, we can now proceed deterministically, 
alternatively moving left and right, comparing symbols. 


For a completely deterministic solution, we first find the center of the input (e.g., by putting 
markers at each end, and moving them inward until they meet). 


6. Choose a value for n. To do this, generate 1, 2,..., stopping nondeterministically at some n. 


Determine if the length of the input is a multiple of n. If it is, accept. If a” € L, then there is some n 
for which this works. 


7. One stack will keep the contents of the tape to the right of some reference point, the other stack the 
tape contents to the left. Left and right moves are then done simply by popping and pushing the 


stack. 


Section 10.4 
3. An algorithm, in outline, is as follows. 
(i) Start with a copy of the preceding string. 
(ii) Find the rightmost 0. Change it to a 1. Then change all the 1’s to the right of this to 0’s. 
(iii) If there are no 0’s, change all 1’s to 0’s and add a 1 on the left. 
(iv) Repeat from step (1). 


8. Let S1 = {5}, 5o,...} and $> = {t, t,...}. Then their union can be enumerated by 
5; US, = {s1, t1; S2; to, aut i 


If some s; =¢;, we listitonly once. The union of the two sets is therefore countable. For $1 X S2, 
use the ordering in Figure 10.17. 


Section 10.5 


2. First, divide the input by two and move the result to one part of the tape. This free space, initially 
occupied by the input, can then be used to store successive divisors. 


4. (e) Use a three-track machine as shown below. On the third track, we keep the current trial 
value for |w|. On the second track, we place dividers every |w| cells. We then compare the 
cell contents between the markers. 


input 


dividers 


trial value of lwl 


6. Use Exercise 15, Section 6.2, to find a grammar in two-standard form. Then use the construction in 
Theorem 7.1. The pda we get from this consumes one input symbol on every move and never 
increases the stack contents by more than one symbol each time. 


Chapter 11 


Section 11.1 


2. We know that the union of two countable sets is countable and that the set of all recursively 
enumerable languages is countable. If the set of all languages that are not recursively enumerable 
were also countable, then the set of all languages would be countable. But this is not the case, as 
we know. 


6. Let ZL, and L, be two recursively enumerable languages and M, and M, be the respective Turing 


machines that accept these two languages. When represented with an input w, we 
nondeterministically choose M, or M, to process w. The resultis a Turing machine that accepts L, 
U L. 


11. A context-free language is recursive, so by Theorem 11.4 its complement is also recursive. 
Note, however, that the complement is not necessarily context-free. 


14. For any given w € L*, consider all splits w = wjw>...W„. For each split, determine whether or 
not w; € L. Since for each w there are only a finite number of splits, we can decide whether or 
not w € L}. 


18. The argument attempting to show by diagonalization that 2° is not countable for finite S fails 
because the table in Figure 11.2 is not square, having |2°| rows and |S| columns. 


LSI columns 


2! Sl rows 


When we diagonalize, the result on the diagonal could be in one of the rows below. 


Section 11.2 
1. Look ata typical derivation: 
S Š aSıbB > aaSıbbB Š a” S1b” B > a™*1b" 1B > at "1B S .... 
From this it is not hard to conjecture that the grammar derives 


Ti Tan a n>1,k=-1, LS} ` 


3. Formally, the grammar can be described by G = (V,S, T,P), with S © (V UT)” and 
L(G) = {x € T* : s >g z for any s E€ S}. 


The unrestricted grammars in Definition 11.3 are equivalent to this extension because to any 
given unrestricted grammar we can always add starting rules Sọ — s; for all s; € S. 


7. To get this form for unrestricted grammars, insert dummy variables on the right whenever |u| > |v]. 
For example, 


AB—C 
can be replaced by 
AB — CD, 
D —> x. 


The equivalence argument is straightforward. 


Section 11.3 
1. (c) Working with context-sensitive grammars is not always easy. The idea of a messenger, 
introduced in Example 11.2, is often useful. In this problem, the first step is to create the 
sentential form a”Bc”D. The variables B and D will actas markers and messengers to assure 
that the correct number of b's and d's are created in the right places. The first part is 
achieved easily with the productions 


ŞS — aAcD|aBcD, 
A — aAcļaBc. 
In the next step, the B travels to the right to meet the D, by 


Be — cB, 
Bb — bB. 


When that happens, we can create one d and a return messenger that will put the b in the right 
place and stop. 


BD — Ed. 
cE — Ee, 
bE — Eb, 
ak — ab. 


Alternatively, we create ad plus a marker D, with a different messenger that creates a b, but 
keeps the process going: 


BD — F Dad, 
cF — Fe, 
bF — Fb, 
aF — abB. 


4. The easiest argument is from an lba. Suppose that a language is context-sensitive. Then there exists 
an lba M that accepts it. Given w, we first rewrite it as w?, then apply M to it. Because L* = {w : 
wk e L}, M accepts w? if and only if w € LË. The machine that reverses a string and applies M is 


an lba. Therefore, LÊ is context-sensitive. 


6. We can argue from an lba. Clearly, there is an lba that can recognize any string of the form wuw*. 
Just start at opposite ends and compare symbols to get a match. Find the longest possible w, then 
compare its length with u. Since there is an lba, the language is context-sensitive and a context- 
sensitive grammar must exist. 


Chapter 12 


Section 12.1 


3. Given M and w, modify M to get al which halts if and only if a special symbol, say an introduced 
symbol #, is written. We can do this by changing the halting configurations of M so that every one 


writes #, then stops. Thus, M halts implies that at writes #, and al writes # implies that M halts. 
Thus, if we have an algorithm that tells us whether or not a specified symbol a is ever written, we 


apply it to M. with a = #. This would solve the halting problem. 

7. Given (M, w) modify M to M. so that (M, w) halts if and only if at accepts some simple language, 
say {a}. This can be done by M first checking the input and remembering whether the input was a. 
Then M carries out its normal computations. When it halts, check if the input was a. Accept if so, 


reject otherwise. Therefore, at accepts {a} if and only if M halts. Now construct a simple Turing 
machine, say M4, that accepts a. If we had an algorithm that checks for the equality of two 
(37) =L(Mı) yp (17) = L(M3) 


languages, we could use it to see if z then (M, w) halts. If 


M) + L(M;) l . 

á ( 1) FEAM) then (M, w) does not halt and we have a solution to the halting problem. 

10. Given (M, w) we modify M so that it always halts in the configuration qf w. If the given problem 
was decidable, we could apply the supposed algorithm to the modified machine, with 
configurations ggw and qw. This would give us a solution of the halting problem. 


13. Take a universal Turing machine and let it simulate computations on an empty tape. Whenever 
the simulated computations halt, accept the Turing machine being simulated. The universal 
Turing machine is therefore an accepter for all Turing machines that halt when applied to a blank 
tape. The set is therefore recursively enumerable. 


Suppose now the set were recursive. There would then exist an algorithm A that lists all Turing 
machines that halt on a blank tape input in some order of increasing lengths of the program. See 
if the original Turing machine is amongst the Turing machines generated by A. Since the length of 
the original program is fixed, the comparison will stop when this length is exceeded. Thus, we 
have a solution to the blank tape halting problem. 


16. If the specific instances of the problem are pj, p>,..., Pn) We construct a Turing machine that 
behaves as follows: 


if problem = p, then return false, 


if problem = pə then return true, 


if problem = p,, then return true. 


Whatever the truth values of the various instances are, there is always some Turing machine that 
gives the right answer. Remember that it is not necessary to know what that Turing machine is, 
only to guarantee that it exists. 


Section 12.2 


3. Suppose we had an algorithm to decide whether or not £ (Mı) © L (M2). We could then construct a 
machine M, such that / (M2) = © and apply the algorithm. Then £ (Mı) © L (M2) if and only if 
L(Mi) = Ø, But this contradicts Theorem 12.3, since we can construct M/ from any given 
grammar G. 


6. If we take L(G,) = =", the problem becomes the question of Theorem 12.3 and is therefore 
undecidable. 


8. Since there are some grammars for which L(G) = Z(G)" and some for which this is not so, the 
undecidability follows from Rice's theorem. To do this from first principles is a little harder. 
Take the halting problem (M, w) and modify it(along the lines of Theorem 12.4), so that if (M, w) 


halts, “will accept {a}* and if (M, w) does not halt, “” accepts ø. From ~”’ get the grammar G 
by the construction leadingto Theorem 11.7. If 4 (M) = {a}", then L(G) =L (ê) ={a}. But if 


L (17) ~ © then : (ê) RA (8) =A Therefore, if this problem were decidable, we 
could get a solution of the halting problem. 


Section 12.3 


1. A PC-solution is w3w,w; = v3v4v;. There is no MPC solution because one string would have a 
prefix 001, the other 01. 


3. If w; > |v,| or jw, < |v,| for all i, then clearly there is no solution. If this condition does not hold, 
then either |w,| = |v,|, for some i, which has a trivial solution, or there must exist aj and a k such 
that |w;| > |v| and |w,| < ||. In the latter case, the PCP has a solution “ Wk, where 
r = |vk| — [w| and s = |w;| — |v;|, 

5. (a) The problem is undecidable. If it were decidable, we would have an algorithm for 


E p ; Lee R 
deciding the original MPC problem. Given w4, w>...; w,, we form W1 W2 -Wn and use the 


= (wP...wPwP)* 


assumed algorithm. Since “1% Wk , the original MPC problem has a 
solution if and only if the new MPC problem has a solution. 


Section 12.5 


1. (a) Find the middle of the string, then go back and forth to match symbols. Both parts take 
O(n’) moves. 


(b) Count the number of symbols. If the count is done in unary, this is an O(n) operation. Next, 
write the first half on the second tape. This is also an O(n) operation. Finally, you can 
compare in O(n) moves. 


(c) You can guess the middle of the string nondeterministically in one move. But the matching 
still takes O(n?) moves. 


(d) Guess the middle of the string, then proceed as in (b). Total effort required is O(n). 


Chapter 13 
Section 13.1 
2. Using the function swbtr in Example 13.3, we get the solution 


greater (x, y) = subtr (1, subtr (1, subtr (x, y))). 


g(x,y) = mult (x, g (z,y— 1)), 


g(x, 0) $; 


I 


9. (a) 


A(l.y) = A(0,A(1,y—-—1)) 
A(l,y-1)+4+1 
= A(l,y—2)+2 


= A(1,0)+y 
=y+2. 
(b) With the results of part (a) we can use induction to prove the next identity. Assume that for 
y=1,2,...,n —1, we have A(2,y) =2y + 3. Then 


A(2,n) = A(1,A(2,n—1)) 
= A(1,2n-+1) 
= 2n +3, from part (a). 


Since 


A (2,0) = A(1,1) 


= 3. 


we have a basis and the equation is true for all y. 


15. (b) If2* +y — 3 = 0, then y = 3 — 2*. The only values of x that give a positive y are 0 and 1, so 
the domain of u is {0,1}, giving a minimum value of y = 1. Therefore, 


py (2° +y—3)=1. 
Section 13.2 


1. (b) Use Cr = {a,b,c}, Cy = {x}, and A = {x}. The nonterminal x is used as a boundary 


between the left and right sides of the target string and the two w's are built simultaneously 
by 


Vi2V2 — ViarVza |VibrVəb| VicxVzc. 
At the end, the x is removed by 
TETN. 


3. At every step, the only possible identification of V, is with the entire derived string. This results 
in a doubling of the string and 
L= fa” n> L} i 


5. A solution is 


Vi * Vo = Va > V11 * Vo = VaVa, 
Vi * Vo = Va > Vy * Vol = Vay. 


For example 
Le1=12S3 1l41l=11l> 1beil=1111, 


and so on. 


Section 13.3 


1. 


Py: S— S153, 

Py : Sy — aS y. S2 — aSo, 
P; : Sy — bS1, S2 — bS2, 
P, e Sy — À, So — À. 


5. The solution here is reminiscent of the use of messengers with context-sensitive grammars. 


ab — T 


8. While in principle each symbol has to be rewritten at each step, this can be circumvented by a > 
a. Therefore, we can rewrite so that in each step a single a is added, giving the language L(aa’). 


Chapter 14 


Section 14.1 


3. The choice of algorithm is important in sorting. Simple methods, such as the bubblesort, have 
time-complexity O(n’). The most efficient sorting algorithms have time-complexity O(nlogn). 

Section 14.2 

4, (T1 V £3) A(22 V 23). 

7. If one configuration grows, the information on the tape has to be moved. Suppose a right shift has 
to be made. Go to the right end of the tape and move every symbol in the active area one cell to 


the right. This takes O(nk”) moves. If every one of the k” configurations grows in a single move, 


the complete process requires O(n°k2”) moves. Since this is dominated by O(k?”), the conclusion 
of the theorem is unaffected. 


Section 14.3 


4. This is an immediate consequence of Theorem 14.4. 


Section 14.5 


3. It is tempting to say something like “nondeterministically select V; as first vertex, V; as second 


vertex, etc.” but this is not correct. While nondeterminism implies a choice, the choice on each 
move is froma limited number of options. Since the 7 in V; is arbitrarily large, we cannot do this. 
A better way is to 


(1) Create a list of numbers 1, 2,..., n in unary notation. The length of the list is O(n’), so this 
can be done in O(n’) time. 


(2) Scan the list and nondeterministically select one number. If a number is selected, add it to 
the permutation and remove it from the list. 


(3) Repeat step (2) n times. 


Therefore, the whole process can be done in O(n?) time. 


Section 14.6 


5. Take the HAMPATH graph and complete it. Weight existing edges with 0 and new edges with 1. 
Apply the TSP algorithm with k = 0. If there is a solution, then there exists a Hamiltonian path. 
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